What Advent of Code Has Taught Me About Data Science

-

within the , a series of each day programming challenges released throughout December, for the primary time. The each day challenges normally contain two puzzles constructing on an identical problem. Although these challenges and problems don’t resemble typical data science workflows, I actually have realized that most of the habits, ways of considering, and approaching problems that they encourage might be translated surprisingly well to data-focused work. In this text, I reflect on five learnings that I got from following the challenge this yr and the way they translate to data science.

For me, was more of a controlled practice environment for revisiting fundamentals and dealing on my programming skills. You might be specializing in the essentials as distractions that you just would face in a day-to-day job aren’t present; you might have no meetings, shifting requirements, stakeholder communication, or coordination overhead. As a substitute, you might have a feedback loop that is easy and binary: your answer is correct or it just isn’t. There isn’t any “almost correct”, no way of explaining the end result, and no way of selling your solution. At the identical time, you might have the liberty and suppleness to decide on any approach you see fit so long as you may arrive at an accurate solution.

Working in such a setting was quite difficult, yet very useful because it also exposed habits. On condition that you might have little or no room for ambiguity and can’t hide your mistakes, any flaws in your work were exposed immediately. Over time, I also realized that the majority of the failures I encountered had little to do with syntax, algorithm selection, or coding implementation but much more with the way in which how I actually have approached problems before touching any code. What follows are my key learnings from this experience.

Image created by writer with ChatGPT

Lesson 1: Sketch the Solution – Think Before You Code

One pattern that surfaced often during was my tendency to go straight into implementation. When faced with a brand new problem, I used to be normally tempted to start out coding immediately and take a look at to converge to an answer as quickly as possible. Sarcastically, this approach often caused exactly the other. For instance, I wrote deeply nested code to handle edge cases that inflated runtime of the code without realizing that a much simpler solution existed.

What eventually helped me was to take a step back before starting with the code. As a substitute, I began by noting requirements, inputs, and constraints. The strategy of noting this down helped me to get a level of clarity and structure that I had been missing after I jumped directly into the code. Moreover, fascinated with possible approaches, outlining a rough solution, or working on some pseudocode helped to formalize the needed logic even further. Once this was done, the act of implementing it via the code became quite a bit easier.

This learning might be translated to data science as many problems might be difficult as a consequence of unclear goals, poorly framed objectives, or because constraints, and requirements aren’t known well enough upfront. By defining desired outcomes and reasoning in regards to the solution before starting to put in writing code can prevent wasted effort. Working backward from the intended result as a substitute of going forward from a preferred technology helps to maintain the concentrate on the actual goal that should be achieved.

Learning 2: Input Validation – Know Your Data

Even after taking this approach of sketching solutions and defining the specified solution upfront, one other recurring obstacle surfaced: the input data. Some failures that I experienced had nothing to do with faulty code but with assumptions in regards to the data that I had made which didn’t hold in practice. In a single case, I assumed the info had a certain minimum and maximum boundary which turned out to be flawed, resulting in an incorrect solution. In any case, code might be correct when seen in isolation, yet fail completely when it’s working with data it has never been designed to work on.

This again showed why checking the input data is so crucial. Often, my solution didn’t should be revamped entirely, smaller adjustments reminiscent of introducing additional conditions or boundary checks were enough to acquire an accurate and robust solution. Moreover, initial data investigation can offer signals in regards to the scale of the info and indicate which approaches are feasible. When facing large ranges, extreme values, or high cardinality, it is vitally likely that brute-force methods, nested loops, or combinatorial approaches will hit a limit quickly.

Naturally, that is equally as vital in data science projects where assumptions about data (implicit or explicit) can result in serious issues if they continue to be unchecked. Investigating data early is a crucial step to stop problems from propagating downstream where they’ll get much harder to repair later. The important thing takeaway just isn’t to avoid assumptions about data in any respect but reasonably to make them explicit, document them, and test them early on in the method.

Learning 3: Iterate Quickly – Progress Over Perfection

The puzzles in are often split into two parts. While the second often builds on the primary one, it introduces a brand new constraint, challenge, or twist reminiscent of a rise in the issue size. The rise in complexity often invalidated the initial solution for the primary part. Nonetheless, this doesn’t mean that the answer to the primary part is useless because it provides a useful baseline.

Having such a working baseline helps to make clear how the issue behaves, how it may be tackled, and what the answer already achieves. From there on, improvements might be tackled in a more structured way as one knows which assumptions now not hold and which parts must change to reach at a successful solution. Refining a concrete baseline solution is subsequently much easier than designing an abstract “perfect” solution right from the beginning.

In , the second part is simply appearing after the primary one is solved, thereby making early attempts to search out an answer that works for each parts pointless. This structure reflects a constraint commonly encountered in practice as one normally doesn’t know all requirements upfront. Attempting to anticipate all of the possible extensions that is likely to be needed upfront just isn’t only largely speculative but in addition inefficient.

In data science, similar principles might be observed. As requirements shift, data sources evolve, and stakeholders refine their needs and asks, projects and solutions must evolve as well. As such, starting with easy solutions and iterating based on real feedback is way simpler than attempting to provide you with a totally general system from the outset. Such a “perfect” solution isn’t visible firstly and iteration is what allows solutions to converge toward something useful.

Learning 4: Design for Scale – Know the Limits

While iteration emphasizes to start out with easy solutions, also repeatedly points out the importance of understanding scale and the way it affects the approach for use. In lots of puzzles, the second part doesn’t simply add logical complexity but in addition increase the issue size dramatically. Thus, an answer with exponential or factorial complexity could also be sufficient for the primary part but begin to grow to be impractical when the issue size grows within the second part.

Even when starting with an easy baseline, it’s crucial to have a rough idea of how that solution will scale. Nested loops, brute-force enumeration, or exhaustive searches of combos signal that the answer will stop working as efficiently when the issue size grows. Knowing the (approximate) breaking point subsequently makes it easier to gauge if or when a rewrite is crucial.

This doesn’t contradict the thought of avoiding premature optimization. Moderately, it indicates that one should understand the trade-offs an answer makes without having to implement essentially the most efficient or scalable approach straight away. Designing for scale means having an awareness of scalability and complexity, not having to optimize blindly from the beginning.

The parallel to data science can also be given here as solutions may match well on sample data or limited datasets but are susceptible to fail when faced with “production-level” sizes. Being conscious of those bottlenecks, recognizing likely limits and keeping alternative approaches in mind makes these systems more resilient. Knowing where an answer could stop working can prevent costly redesigns and rewrites later, even in the event that they aren’t implemented immediately.

Learning 5: Be Consistent – Momentum Beats Motivation

One in every of the less obvious takeaways from participating within the Advent of Code had less to do with problem solving and way more with “showing up”. Solving a puzzle day by day sounds manageable in theory but in practice was difficult, especially when it collided with fatigue, limited time, or a decline in motivation, especially after a full day of labor. Hoping for motivation to magically reappear was subsequently not a viable strategy.

Real progress got here from working on problems every day, not from occasional bursts of inspiration. The repetition reinforced ways of considering and disentangling problems which in turn created momentum. Once that momentum was built, progress began to compound and consistency mattered greater than intensity did.

Skill development in data science rarely comes from one-off projects or isolated deep dives either. As a substitute, it’s resulting from repeated practice, reading data fastidiously, designing solutions, iterating on models, and debugging assumptions done consistently over time. Counting on motivation just isn’t viable, but having fixed routines makes it sustainable. exemplified this distinction: while motivation fluctuates, consistency compound. Having such a each day structure helped to show solving puzzles right into a habit reasonably than an aspiration.

Image generated by writer with ChatGPT

Closing Thoughts

Looking back at it, the true value that I derived from participating in was not in solving single puzzles, learning some latest coding tricks but as a substitute it was from making my habits visible. It highlighted where I are likely to rush to solutions, where I are likely to overcomplicate and where slowing down and taking a step back would have saved me quite a lot of time. The puzzles as such were only a method to an end, the learnings I got out of them were the true value.

worked best for me when seen as deliberate practice reasonably than as a contest. Showing up consistently, specializing in clarity over cleverness and refining solutions as a substitute of chasing perfect solutions from the beginning turned out to be much more useful than finding a single solution.

If you might have not tried it yet yourself, I’d recommend giving it a shot, either throughout the event next yr or by working through past puzzles. The method quickly surfaces habits that carry over beyond the puzzles themselves. And should you enjoy tackling challenges, you’ll most certainly find it a genuinely fun and rewarding experience.

ASK ANA

What are your thoughts on this topic?
Let us know in the comments below.

0 0 votes
Article Rating
guest
0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments

Share this article

Recent posts

0
Would love your thoughts, please comment.x
()
x