Data Science Highlight: Chosen Problems from Advent of Code 2025

is an annual advent calendar of programming puzzles which might be themed around helping Santa’s elves prepare for Christmas. The whimsical setting masks the proven fact that many puzzles call for serious algorithmic problem-solving, especially towards the tip of the calendar. In a previous article, we discussed the importance of algorithmic pondering for data scientists at the same time as AI-assisted coding becomes the norm. With Advent of Code 2025 having wrapped up last month, this text takes a better take a look at a number of problems from the event which might be especially relevant for data scientists. We’ll sketch out some interesting solution approaches in Python, highlighting algorithms and libraries that could be leveraged in a big selection of real-world data science use cases.

Navigating Tachyon Manifolds with Sets and Dynamic Programming

The primary problem we are going to take a look at is Day 7: Laboratories. We’re given a tachyon manifold in a file called input_d7.txt, as shown below:

.......S.......
...............
.......^.......
...............
......^.^......
...............
.....^.^.^.....
...............
....^.....^....
...............
...^.^...^.^...
...............
..^...^.....^..
...............
.^...^.^.....^.
...............

A tachyon beam (“|”) starts at the highest of the manifold and travels downward. If the beam hits a splitter (“^”), it splits into two beams, one on either side of the splitter. Part Certainly one of the puzzle asks us to find out the variety of times a beam will split given a set of initial conditions (start line of the beam and the manifold layout). Note that simply counting the variety of splitters and multiplying by two is not going to give the right answer, since overlapping beams are only counted once, and a few splitters are never reached by any of the beams. We are able to leverage set algebra to account for these constraints as shown within the implementation below:

import functools

def find_all_indexes(s, ch):
    """Return a set of all positions where character ch appears in s."""
    return {i for i, c in enumerate(s) if c == ch}

with open("input_d7.txt") as f:
    first_row = f.readline()  # row containing initial beams ('S')
    f.readline()  # skip separator line
    rows = f.readlines()  # remaining manifold rows

beam_ids = find_all_indexes(first_row, "S")  # lively beam column positions
split_counter = 0  # total variety of splits

for row_index, line in enumerate(rows):
    # Only even-indexed rows contain splitters
    if row_index % 2 != 0:
        proceed

    # Find splitter positions on this row
    splitter_ids = find_all_indexes(line, "^")

    # Beams that hit a splitter (intersection)
    hits = beam_ids.intersection(splitter_ids)
    split_counter += len(hits)

    # Recent beams created by splits (left and right)
    if hits:
        new_beams = functools.reduce(lambda acc, h: acc.union({h - 1, h + 1}), hits, set())
    else:
        new_beams = set()

    # Update lively beams (add latest beams, remove beams that hit splitters)
    beam_ids = beam_ids.union(new_beams).difference(splitter_ids)

print(split_counter)

We use the intersection operation to discover the splitters which might be directly hit by lively beams coming from above. Recent beams are created to the left and right of each splitter that’s hit, but overlapping beams are only counted once with the union operator. The set of beams resulting from each layer of splitters within the tachyon manifold is computed using a listing comprehension wrapped in a reduce function, a higher-order function that helps to simplify the code and typically seen in functional programming. The difference operator ensures that the unique beams incident on the splitter usually are not counted among the many set of outgoing lively beams.

In a classical system, if a tachyon particle is distributed through the manifold and encounters a splitter, the particle can only proceed along one unique path to the left or right of the splitter. Part Two of the puzzle introduces a quantum version of this setup, by which a particle concurrently goes down each the left and right paths, effectively spawning two parallel timelines. Our task is to find out the whole variety of timelines that exist after a particle has traversed all viable paths in such a quantum tachyon manifold. This problem could be solved efficiently using dynamic programming as shown below:

from functools import lru_cache

def count_timelines_with_dfs_and_memo(path):
    """Count distinct quantum timelines using DFS + memoization (top-down DP)"""
    with open(path) as f:
        lines = [line.rstrip("n") for line in f if line.strip()]

    height = len(lines)
    width = len(lines[0])

    # Find starting column
    start_col = next(i for i, ch in enumerate(lines[0]) if ch == "S")

    @lru_cache(maxsize=None)
    def dfs_with_memo(row, col):
        """Return variety of timelines from (row, col) to bottom using DFS + memoization"""
        # Out of bounds horizontally
        if col < 0 or col >= width:
            return 0
        
        # Past the underside row: one complete timeline
        if row == height:
            return 1

        if lines[row][col] == "^":
            # Split left and right
            return dfs_with_memo(row+1, col-1) + dfs_with_memo(row+1, col+1)
        else:
            # Proceed straight down
            return dfs_with_memo(row+1, col)

    return dfs_with_memo(1, start_col)

print(count_timelines_with_dfs_and_memo("input_d7.txt"))

Recursive depth-first search with memoization is used to establish a top-down type of dynamic programming, where each subproblem is solved once and reused multiple times. Two base cases are defined: a sound timeline will not be created if a particle goes out of bounds horizontally, and a whole timeline is counted once the particle reaches the underside of the manifold. The recursive step accounts for 2 cases: every time the particle reaches a splitter, it branches into two timelines, otherwise it continues straight down in the present timeline. Memoization (using the @lru_cache decorator) prevents recalculation of known values when multiple paths converge at the identical location within the manifold.

In practice, data scientists can use the tools and techniques described above in quite a lot of situations. The concept of beam splitting is analogous in some ways to the proliferation of information packets in a fancy communications network. Simulating the cascading process is a bit like modeling supply chain disruptions, epidemics, and knowledge diffusion. At a more abstract level, the puzzle could be framed as a constrained graph traversal or path counting problem. Set algebra and dynamic programming are versatile concepts that data scientists can use to unravel such seemingly difficult algorithmic problems.

Constructing Circuits with Nearest Neighbor Search

The following problem we are going to take a look at is Day 8: Playground. We’re supplied with a listing of triples that represent the 3D location coordinates of electrical junction boxes in a file called input_d8.txt, as shown below:

162,817,810

59,618,56

901,360,560

…

In Part One, we’re asked to successively discover and connect pairs of junction boxes which might be closest together when it comes to straight-line (or Euclidean) distance. Connected boxes form a circuit through which electricity can flow. The duty is ultimately to report the results of multiplying together the sizes of the three largest circuits after connecting the 1000 pairs of junction boxes which might be closest together. One neat solution involves using a min-heap to store pairs of junction box coordinates. Following is an implementation based on an instructive video by James Peralta:

from collections import defaultdict
import heapq
from math import dist as euclidean_dist

# Load points
with open("input_d8.txt") as f:
    points = [tuple(map(int, line.split(","))) for line in f.read().split()]

k = 1000

# Construct min‑heap of all pairwise distances
dist_heap = [
    (euclidean_dist(points[i], points[j]), points[i], points[j])
    for i in range(len(points))
    for j in range(i + 1, len(points))
]

heapq.heapify(dist_heap)

# Take k shortest edges and construct adjacency list
neighbors = defaultdict(list)
for _ in range(k):
    _, a, b = heapq.heappop(dist_heap)
    neighbors[a].append(b)
    neighbors[b].append(a)

# Use DFS to compute component size
def dfs(start, seen):
    stack = [start]
    seen.add(start)
    size = 0

    while stack:
        node = stack.pop()
        size += 1
        for nxt in neighbors[node]:
            if nxt not in seen:
                seen.add(nxt)
                stack.append(nxt)
    return size

# Compute sizes of all connected components
seen = set()
sizes = [dfs(p, seen) for p in points if p not in seen]

# Derive final answer
sizes.sort(reverse=True)
a, b, c = sizes[:3]
print("Solution:", a * b * c)

A min-heap is a binary tree by which parent nodes have values lower than or equal to the values of their child nodes; this ensures that the smallest value is stored at the highest of the tree and could be accessed efficiently. Within the above solution, this beneficial property of min-heaps is used to quickly discover the closest neighbors among the many given junction boxes. The 1000 nearest pairs thus identified represent a 3D graph. Depth-first search is used to traverse the graph ranging from a given junction box and count the variety of boxes which might be in the identical connected graph component (i.e., circuit).

In Part Two, resource scarcity is introduced (not enough extension cables). We must now proceed connecting the closest unconnected pairs of junction boxes together until they’re all a part of one large circuit. The required answer is the results of multiplying together the x-coordinates of the last two junction boxes that get connected. To resolve this problem, we will use a union-find data structure and Kruskal’s algorithm for constructing minimum spanning trees as follows:

import heapq
from math import dist as euclidean_dist

# Load points
with open("input_d8.txt") as f:
    points = [tuple(map(int, line.split(","))) for line in f.read().split()]

# Construct min‑heap of all pairwise distances
dist_heap = [
    (euclidean_dist(a, b), a, b)
    for i, a in enumerate(points)
    for b in points[i+1:]
]
heapq.heapify(dist_heap)

# Define functions to implement Union-Find
parent = {p: p for p in points}

def find(x):
    if parent[x] != x:
        parent[x] = find(parent[x])
    return parent[x]

def union(a, b):
    ra, rb = find(a), find(b)
    if ra == rb:
        return False
    parent[rb] = ra
    return True

# Use Kruskal's algorithm to attach points until all are in a single component
edges_used = 0
last_pair = None

while dist_heap:
    _, a, b = heapq.heappop(dist_heap)
    if union(a, b):
        edges_used += 1
        last_pair = (a, b)
        if edges_used == len(points) - 1:
            break

# Derive final answer
x_product = last_pair[0][0] * last_pair[1][0]
print(x_product)

The situation data is stored in a min-heap and connected graph components are built. We repeatedly take the shortest remaining edge between two points and only keep that edge if it connects two previously unconnected components; that is the essential idea behind Kruskal’s algorithm. But to do that efficiently, we want a way of quickly determining whether two points are already connected. If yes, then union(a, b) == False, and we skip the sting to avoid making a cycle. Otherwise, we merge their graph components. Union-find is an information structure that may perform this check in nearly constant time. To make use of a company analogy, it’s a bit like asking “Who’s your boss?” repeatedly until you reach the CEO after which rewriting the worth of everyone’s boss to be the name of the CEO (i.e., the foundation). Next time, when someone asks, “Who’s your boss?”, you’ll be able to quickly respond with the CEO’s name. If the roots of two nodes are the identical, the respective components are merged by attaching one root to the opposite.

The circuit-building problem pertains to clustering and community detection, that are essential concepts to know for real-life data science use cases. For instance, constructing graph components by identifying nearest neighbors could be a part of practical algorithm for grouping customers by similarity of preferences, detecting communities in social networks, and clustering geographical locations. Kruskal’s algorithm could be used to design and optimize networks by minimizing routing costs. Abstract concepts corresponding to Euclidean distances, min-heaps, and union-find help us measure, prioritize, and organize data at scale.

Configuring Factory Machines with Linear Programming

Next, we are going to walk through the issue posed in Day 10: Playground. We’re given a manual for configuring factory machines in a file called input_d10.txt as shown below:

[.##.] (2) (0,3) (2) (2,3) (0,2) (0,1) {3,5,4,7}
[..##.] (0,2,3) (2,3) (0,4) (0,1,2) (1,2,3,4) {7,5,12,8,2}
[.###.#] (0,1,2,3) (0,3,4) (0,1,2,4,5) (1,2) {10,11,9,5,10,5}

Each line describes one machine. The variety of characters within the square brackets reflects the variety of indicator lights and their desired states (“.” means off and “#” on). All lights will initially be off. Button wiring schematics are shown in parentheses; e.g., pressing the button with schematic “(2, 3)” will flip the present states of the indicator lights at positions 2 and three from “.” to “#” or vice versa. The target of Part One is to find out the minimum button presses needed to accurately configure the indicator lights on all given machines. A chic solution using mixed‑integer linear programming (MILP) is shown below:

import re
import numpy as np
from scipy.optimize import milp, LinearConstraint, Bounds

# Parse a single machine description line
def parse_machine(line: str):
    # Extract light pattern
    match = re.search(r"[([.#]+)]", line)
    if not match:
        raise ValueError(f"Invalid line: {line}")

    pattern = match.group(1)
    m = len(pattern)

    # Goal vector: '#' -> 1, '.' -> 0
    goal = np.fromiter((ch == "#" for ch in pattern), dtype=int)

    # Extract button wiring
    buttons = [
        [int(x) for x in grp.split(",")] if grp.strip() else []
        for grp in re.findall(r"(([^)]*))", line)
    ]

    # Construct toggle matrix A
    n = len(buttons)
    A = np.zeros((m, n), dtype=int)

    for j, btn in enumerate(buttons):
        for idx in btn:
            if not (0 <= idx < m):
                raise ValueError(f"Button index {idx} out of range for {m} lights")
            A[idx, j] = 1

    return A, goal

# Solve all machines within the input file
def solve_d10_part1(filename):
    with open(filename) as f:
        lines = [line.strip() for line in f if line.strip()]

    total = 0

    for line in lines:
        A, goal = parse_machine(line)
        m, n = A.shape

        # Objective: minimize sum(x)
        c = np.r_[np.ones(n), np.zeros(m)]

        # Specify constraint
        A_eq = np.hstack([A, -2 * np.eye(m)])
        lc = LinearConstraint(A_eq, goal, goal)

        # Define bounds
        lb = np.zeros(n + m)
        ub = np.r_[np.ones(n), np.full(m, np.inf)]
        bounds = Bounds(lb, ub)

        # Specify integrality
        integrality = np.r_[np.full(n, 2), np.full(m, 1)]

        res = milp(c=c, constraints=[lc], integrality=integrality, bounds=bounds)

        if not res.success:
            raise RuntimeError(f"No feasible solution for line: {line}")

        total += round(res.x[:n].sum())

    return total

print(solve_d10_part1("input_d10.txt"))

First, each machine is encoded as a matrix by which the rows are the lights and the columns are the buttons. [, ] = 1 if button toggles light . Regular expressions are used for pattern matching on the input data. Next, we arrange the optimization problem with a binary button‑press vector , integer slack variables , and a goal light pattern . For every machine, our aim is to decide on button presses , such that = 1 if the -th button is pressed and 0 otherwise. The condition “after pressing buttons , the lights equal goal ” reflects the congruence ≡ (mod 2), but for the reason that MILP solver cannot take care of mod 2 directly, we express the condition as – 2 = , for some vector consisting only of non-negative integers; this reformulation works because subtracting a good number doesn't change parity. The integrality specification says that the primary variables (the button presses) are binary and the remaining variables (slack) are non-negative integers. We then run the MILP solver with the target of minimizing the variety of button presses needed to succeed in the goal state. If the solver succeeds, res.x[:n] accommodates the optimal button‑press decisions and the code adds the variety of pressed buttons to a running total.

In Part Two, the duty is to succeed in a goal state described by the so-called “joltage” requirements, that are shown in curly braces for every machine. The joltage counters of a machine are initially set to 0, and buttons could be pressed any variety of times to update the joltage levels. For instance, the primary machine starts with joltage values “{0, 0, 0, 0}”. Pressing button “(3)” once, “(1, 3)” 3 times, “(2,3)” 3 times, “(0,2)” once, and (0,1) twice produces the goal state “{3, 5, 4, 7}”. This also happens to be the fewest button presses needed to succeed in the goal state. Our task is to compute the minimum variety of button presses needed to realize the goal joltage states for all machines. Again, this could be solved using MILP as follows:

import re
import numpy as np
from scipy.optimize import milp, LinearConstraint, Bounds

def parse_machine(line: str):
    # Extract joltage requirements
    match = re.search(r"{([^}]*)}", line)
    if not match:
        raise ValueError(f"No joltage requirements in line: {line}")

    goal = np.fromiter((int(x) for x in match.group(1).split(",")), dtype=int)
    m = len(goal)

    # Extract button wiring
    buttons = [
        [int(x) for x in grp.split(",")] if grp.strip() else []
        for grp in re.findall(r"(([^)]*))", line)
    ]

    # Construct A (m × n)
    n = len(buttons)
    A = np.zeros((m, n), dtype=int)

    for j, btn in enumerate(buttons):
        for idx in btn:
            if not (0 <= idx < m):
                raise ValueError(f"Button index {idx} out of range for {m} counters")
            A[idx, j] += 1

    return A, goal

def solve_machine(A, goal):
    m, n = A.shape

    # Minimize sum(x)
    c = np.ones(n)

    # Constraint: A x = goal
    lc = LinearConstraint(A, goal, goal)

    # Bounds: x ≥ 0
    bounds = Bounds(np.zeros(n), np.full(n, np.inf))

    # All x are integers
    integrality = np.ones(n, dtype=int)

    res = milp(c=c, constraints=[lc], integrality=integrality, bounds=bounds)
    if not res.success:
        raise RuntimeError("No feasible solution")

    return int(round(res.fun))

def solve_d10_part2(filename):
    with open(filename) as f:
        lines = [line.strip() for line in f if line.strip()]

    return sum(solve_machine(*parse_machine(line)) for line in lines)

print(solve_d10_part2("input_d10.txt"))

Whereas Part One was a parity problem, Part Two is a counting problem. The core constraint of Part Two could be captured by the linear equation = , and no slack variables are needed. In a way, Part Two is paying homage to the integer knapsack problem, where a knapsack have to be crammed with the proper combination of otherwise weighted/sized objects.

Optimization problems corresponding to these are sometimes a feature of information science use cases in domains like logistics, supply chain management, and financial portfolio management. The underlying aim is to reduce or maximize some objective function subject to varied constraints. Data scientists would also do well to master using modular arithmetic; see this text for a conceptual overview of modular arithmetic and an exploration of its practical use cases in data science. Finally, there's an interesting conceptual link between MILP and the notion of feature selection with regularization in machine learning. Feature selection is about selecting the least variety of features to coach a model without adversely affecting predictive performance. Using MILP is like performing an explicit combinatorial search over feature subsets with pruning and optimization. L1 regularization amounts to a continuous rest of MILP; the L1 penalty nudges the coefficients of unimportant features towards zero. L2 regularization relaxes the MILP constraints even further by shrinking the coefficients of unimportant features without setting them to precisely zero.

Reactor Troubleshooting with Network Evaluation

The last problem we are going to take a look at is Day 11: Reactor. We're supplied with a dictionary representation of a network of nodes and edges in a file called input_d11.txt as shown below:

you: hhh ccc
hhh: ccc fff iii
…
iii: out

The keys and values are source and destination nodes (or devices as per the issue storyline), respectively. Within the above example, node “you” is connected to nodes “hhh” and “ccc”. The duty in Part One is to count the number of various paths through the network that go from node “you” to “out”. This could be done using depth-first search as follows:

from collections import defaultdict

def parse_input(filename):
    """
    Parse the input file right into a directed graph.
    Each line has the format: source: dest1 dest2 ...
    """
    graph = defaultdict(list)
    with open(filename) as f:
        for line in f:
            line = line.strip()
            if not line:
                proceed
            src, dests = line.split(":")
            src = src.strip()
            for d in dests.strip().split():
                graph[src].append(d.strip())
    return graph

def dfs_paths(graph, start, goal):
    """
    Generate all paths from begin to goal using DFS.
    """
    stack = [(start, [start])]
    while stack:
        (node, path) = stack.pop()
        for next_node in graph.get(node, []):
            if next_node in path:
                # Avoid cycles
                proceed
            if next_node == goal:
                yield path + [next_node]
            else:
                stack.append((next_node, path + [next_node]))

def solve_d11_part1(filename):
    graph = parse_input(filename)
    all_paths = list(dfs_paths(graph, "you", "out"))
    print(len(all_paths))    

solve_d11_part1("input_d11.txt")

We use an explicit stack to implement the search. Each stack entry holds information concerning the current node and the trail to this point. For every neighbor, we skip it whether it is already in the trail, yield the finished path if the neighbor is the “out” node, or push the neighbor and the updated path onto the stack to proceed our exploration of the remaining network. The search process thus enumerates all valid paths from “you” to “out” and the ultimate code output is the count of distinct valid paths.

In Part Two, we're asked to count the variety of paths that go from “svr” to “out” via nodes “dac” and “fft”. The constraint of intermediate nodes effectively restricts the variety of valid paths within the network. Following is a sample solution:

from collections import defaultdict
from functools import lru_cache

def parse_input(filename):
    graph = defaultdict(list)
    with open(filename) as f:
        for line in f:
            line = line.strip()
            if not line:
                proceed
            src, dests = line.split(":")
            src = src.strip()
            dests = [d.strip() for d in dests.strip().split()]
            graph[src].extend(dests)
            for d in dests:
                if d not in graph:
                    graph[d] = []
    return graph

def count_paths_with_constraints(graph, start, goal, must_visit):
    must_visit = frozenset(must_visit)

    @lru_cache(maxsize=None)
    def dfs(node, seen_required):
        seen_required = frozenset(seen_required)
        if node == goal:
            return 1 if seen_required == must_visit else 0

        total = 0
        for nxt in graph[node]:
            # Avoid cycles by not revisiting nodes already in seen_required+path
            # As an alternative of tracking full path, we assume DAG or small cycles
            new_seen = seen_required | (frozenset([nxt]) & must_visit)
            total += dfs(nxt, new_seen)
        return total

    return dfs(start, frozenset([start]) & must_visit)

def solve_d11_part2(filename):
    graph = parse_input(filename)
    must_visit = {"dac", "fft"}
    total_valid_paths = count_paths_with_constraints(graph, "svr", "out", must_visit)
    print(total_valid_paths)

solve_d11_part2("input_d11.txt")

The code builds on the logic of Part One, in order that we now moreover keep track of visits to the intermediate nodes “dac” and “fft” inside the depth-first search routine. As within the quantum tachyon manifold puzzle, we leverage memoization to preempt redundant computations.

Problems involving network evaluation are a staple of information science. Path enumeration is directly relevant to make use of cases concerning telecommunications, web routing, and power grid optimization. Complex ETL pipelines are sometimes represented as networks (e.g., directed acyclic graphs), and path counting algorithms could be used to discover critical dependencies or bottlenecks within the workflow. Within the context of recommender engines powered by knowledge graphs, analyzing paths flowing through the graph may help with the interpretation of recommender responses. Such recommenders can use paths between entities to justify recommendations, making the system transparent by showing how a suggested item is connected to a user’s known preferences – in spite of everything, we will explicitly trace the reasoning.

The Wrap

In this text we've got seen how the playful scenarios that form the narratives of Advent of Code puzzles can surface genuinely powerful ideas, starting from graph search and optimization to linear programming, combinatorics, and constraint solving. By dissecting these problems and experimenting with different solution strategies, data scientists can sharpen their algorithmic instincts and construct a flexible toolkit that transfers on to practical work spanning feature engineering, model interpretability, optimization pipelines, and more. As AI-assisted coding continues to evolve, the flexibility to border, solve, and critically reason about such problems will likely remain a key differentiator for data scientists. Advent of Code offers a fun, low‑stakes approach to keep those skills sharp – readers are encouraged to aim the opposite puzzles within the 2025 edition and experience the enjoyment of cracking tough problems using algorithmic pondering.

Data Science Highlight: Chosen Problems from Advent of Code 2025

Navigating Tachyon Manifolds with Sets and Dynamic Programming

Constructing Circuits with Nearest Neighbor Search

Configuring Factory Machines with Linear Programming

Reactor Troubleshooting with Network Evaluation

The Wrap

What are your thoughts on this topic?
Let us know in the comments below.

Share this article

Recent posts

Multi-Attribute Decision Matrices, Done Right

Accelerating Stable Diffusion Inference on Intel CPUs

xAI’s Grok Imagine climbs the leaderboards

Ethical Openness at Hugging Face

A hands-on guide to coach LLaMA with RLHF

Data Science Highlight: Chosen Problems from Advent of Code 2025

Navigating Tachyon Manifolds with Sets and Dynamic Programming

Constructing Circuits with Nearest Neighbor Search

Configuring Factory Machines with Linear Programming

Reactor Troubleshooting with Network Evaluation

The Wrap

What are your thoughts on this topic? Let us know in the comments below.

Share this article

Recent posts

What are your thoughts on this topic?
Let us know in the comments below.