be honest. Writing code in 2025 is far easier than it was ten, and even five, years ago.
We moved from Fortran to C to Python, each step lowering the hassle needed to get something working. Now tools like Cursor and GitHub Copilot can write boilerplate, refactor functions, and improve coding pipelines from a couple of lines of natural language.
At the identical time, more people than ever are entering into AI, data science and machine learning. Product managers, analysts, biologists, economists, you name it, are learning the right way to code, understand how AI models work, and interpret data efficiently.
All of this to say this:
The actual difference between a Senior and a Junior Data Scientist just isn’t the coding level anymore.
Don’t get me unsuitable. The difference remains to be technical. It still relies on understanding data, statistics and modeling. However it is not any longer about being the one who can invert a binary tree on a whiteboard or solve an algorithm in O(n).
Throughout my profession, I even have worked with some outstanding data scientists across different fields. Over time, I began to notice a pattern in how the senior data professionals approached problems, and it wasn’t in regards to the specific models they adopted or their coding abilities: it’s in regards to the structured and arranged workflow that they adopt to convert a non-existing product into a sturdy data-driven solution.
In this text, I’ll describe this workflow that Senior Data Scientists use when developing a DS product or feature. Senior Data Scientist:
- Map the ecosystem before touching code
- Think about DS products like operators
- Design the system end-to-end with “pen and paper”
- Start easy, then earn the proper so as to add complexity
- Interrogate metrics and outputs
- Tune the outputs to the audiences and select the proper tools for displaying their work
Throughout the article I’ll expand on each certainly one of these points. My goal is that, by the top of this text, you’ll have the ability to use these six stages on your individual so you may think like a Senior Data scientist in your everyday work.
Let’s start!
Mapping the ecosystem
I get it, data professionals like us fall in love with the “data science core” of a product. We enjoy , trying different, twiddling with the , or testing . In spite of everything, that can also be how most of us were trained. At university, the main focus is on the technique, not the environment where that technique will live.
Nonetheless, Senior Data Scientists know that in real products, the model is just one piece of a bigger system. Around it there’s a whole ecosystem where the product must be integrated. Should you ignore this context, you may easily construct something clever that doesn’t actually matter.
Understanding this ecosystem starts from asking questions like:
- What exact problem are we improving, and the way is it solved today?
- Who will use this model, and the way will it change their every day work?
- What does “higher” seem like in practice from a business perspective (fewer tickets, more revenue, less manual review)?
In a couple of words, before doing any coding or system design, it’s crucial to grasp what the product is bringing to the table.
Your answer, from this step, will sound like this:
[My data product] goals to enhance feature [A] for product [X] in system [Y]. The info science product will improve [Z]. You expect to achieve [Q], improve [R], and reduce [T].
Take into consideration DS products like operators
Okay, now that we’ve a transparent understanding of the ecosystem, we are able to start serious about the info product.
That is an exercise of switching chairs with the actual user. If we’re the user of this product, what does our experience with the product seem like?
To reply our query, we want to reply questions like:
- What’s metric of satisfaction (i.e. success/failure) of the product? What’s the optimal case, non optimal case, and worst case?
- How long is it alright to wait? Is it a few minutes, ten seconds, or real time?
- What’s the budget for this product? How much it’s alright to spend on this?
- What happens when the system fail? Can we fall back to a rule-based decision, ask the user for more information, or just show “no result”? What’s the safest default?

As it’s possible you’ll notice, we’re getting within the realm of system design, but we should not quite there yet. That is more of the preliminary phase where we determine all of the constraints, limits and functionality of the system.
Design the system end-to-end with “pen and paper”
Okay, now we’ve:
- A full understanding of the ecosystem where our product will sit.
- A full grasp of the required DS product’s performance and constraints.
So we’ve every thing we want to begin the System Design* phase.
In a nutshell, we’re using every thing we’ve discovered earlier to find out:
- The input and output
- The Machine Learning structure we are able to use
- How the training and test data shall be built
- The metrics we’re going to use to coach and evaluate the model.
Tools you should utilize to brainstorm this part are Figma and Excalidraw. For reference, this image represents a bit of System Design (the model part/part 2 of the above list) using Excalidraw.

Now that is where the actual skills of a Senior Data Scientist emerge. All the knowledge you could have gathered up to now must converge to your system. Do you could have a small budget? Probably training a 70B parameter DL structure just isn’t idea. Do you wish low latency? Batch processing just isn’t an option. Do you wish a posh NLP application where context matters and you could have a limited dataset? Perhaps LLMs will be an option.
Bear in mind that this remains to be only “pen and paper”: no code is written just yet. Nonetheless, at this point, we’ve a transparent understanding of what we want to construct and the way. NOW, and only now, we are able to start coding.
*System Design is a big topic per se, and to treat it in lower than 10 minutes is largely inconceivable. If you would like to expand on this, a course I highly recommend is this one by ByteByteGo.
Start easy, then earn the proper so as to add complexity
When a Senior Data Scientist works on the modelling, the fanciest, strongest, and complicated Machine Learning models are often the last ones they struggle.
The standard workflow follows these steps:
- Attempt to perform the issue manually: what would you do should you (not the machine) were to do the duty?
- Engineer the features: Based on what you recognize from the previous point (1), what are the features you’d consider? Are you able to craft some features to perform your task efficiently?
- Start easy: try a fairly easy*, traditional machine learning model, for instance, a Random Forest/Logistic Regression for classification or Linear/Polynomial Regression for regression tasks. If it just isn’t accurate enough, construct your way up.
After I say “construct your way up”, that is what I mean:

In a couple of words: we only increase the complexity when vital. Remember: we should not attempting to impress anyone with the newest technology, we are attempting to construct a sturdy and functional data-driven product.
After I say “reasonably easy” I mean that, for certain complex problems, some very basic Machine Learning algorithms might already be out of the image. For instance, if you could have to construct a posh NLP application, you most likely won’t ever use Logistic Regression and it’s secure to begin from a more complex architecture from Hugging Face (e.g. BERT).
Interrogate metrics and outputs
Considered one of the important thing differences between a senior figure and a more junior skilled is the way they appear on the model output.
Normally, Senior Data Scientitst spend a whole lot of time manually reviewing the output . It’s because manual evaluation is certainly one of the primary things that Procuct Managers (the people who Senior Data Scientists will share their work with) do after they wish to have a grasp of the model performance. For that reason, it is vital that the model output looks “” from a manual evaluation standpoint. Furthermore, by reviewing a whole bunch or hundreds of cases manually, you may spot the cases where your algorithm fails. This offers you a start line to enhance your model if vital.
In fact, that’s just the start. The following necessary step is to decide on probably the most opportune metrics to do a quantitative evaluation. For instance, do we would like our model to properly represent all of the classes/decisions of the dataset? Then, recall could be very necessary. Do we would like our model to be extremely on point when it does a classification, even at the fee of sacrificing some data coverage? Then, we’re prioritizing precision. Do we would like each? AUC/F1 scores are our greatest bet.
In a couple of words: the very best data scientists know exactly what metrics to make use of and why. Those metrics shall be those that shall be communicated internally and/or to the clients. Not only that, those metrics shall be the benchmark for the following iteration: if someone wants to enhance your model (for a similar task), it has to enhance that metric.
Tune the outputs to the audiences and select the proper tools to display their work
Let’s recap where we’re:
- We’ve mapped our DS product within the ecosystem and defined our constraints.
- We’ve built our system design and developed the Machine Learning model
- We’ve evaluated it, and it’s accurate enough.
Now it’s finally time to present our work. That is crucial: the standard of your work is just as high as your ability to speak it. The very first thing we’ve to grasp is:
If we’re showing this to a Staff Data Scientist for model evaluation, or we’re showing this to a Software Engineer in order that they can implement our model in production, or a Product Manager that can must report the work to higher decisional roles, we are going to need different sorts of deliveries.
That is the rule of thumb:
- A really high level model overview and metrics result shall be provided to Product Managers
- A more detailed explanation of the model details and the metrics shall be shown to Staff Data Scientists
- Very hands-on details, through code scripts and notebooks, shall be handed to the super-heroes that can make this code into production: the Software Engineers.

Conclusions
In 2025, writing code just isn’t what distinguishes Senior from Junior Data Scientists. Senior data scientists should not “higher” because they know the tensorflow documentation on the highest of their heads. They’re higher because they’ve a specific workflow that they adopt after they construct a data-powerted product.
In this text, we explained the usual Senior Data Scientist workflow though a six layer process:
- A communication layer to tune the delivery to the audience (PM story, DS rigor, engineer-ready artifacts)
- A technique to map the ecosystem before touching code (problem, baseline, users, definition of “higher”)
- A framework to take into consideration DS features like operators (latency, budget, reliability, failure modes, safest default)
- A light-weight pen-and-paper system design process (inputs/outputs, data sources, training loop, evaluation loop, integration)
- A modeling workflow that starts easy and adds complexity only when it’s vital
- A practical method to interrogate outputs and metrics (manual review first, then the proper metric for the product goal)
- A communication layer to tune the delivery to the audience (PM story, DS rigor, engineer-ready artifacts)
Before you head out
Thanks again on your time. It means lots ❤️
My name is Piero Paialunga, and I’m this guy here:

I’m originally from Italy, hold a Ph.D. from the University of Cincinnati, and work as a Data Scientist at The Trade Desk in Recent York City. I write about AI, Machine Learning, and the evolving role of knowledge scientists each here on TDS and on LinkedIn. Should you liked the article and wish to know more about machine learning and follow my studies, you may:
A. Follow me on Linkedin, where I publish all my stories
B. Follow me on GitHub, where you may see all my code
C. For questions, you may send me an email at
