of Contents
📄Python Notebook
🍯Introduction
🔍Example ABC Agent Search Progress
⏳Agent Lifecycle in Swarm Optimization
🐝The three Bee Agent Roles
🪻Iris Dataset
❄ Clustering – No labels? No problem!
🏋️Fitness Model for Clustering
🤔Confusion Matrix as a Diagnostic Tool
🏃Running the Agentic AI Loop
📊Reporting Results
💬Designing Agent Prompts for Gemini
⚠️Gemini Agentic AI Issues
⚔️Agentic AI Competitive Landscape towards 2026
✨Conclusion and Future Work
📄Python Notebook
🍯 Introduction
With the incredible innovation occurring around Agentic AI, I desired to get hands‑on with a project that integrates LLM prompts right into a Data Science workflow. The Artificial Bee Colony (ABC) algorithm is inspired by honey bees’ foraging behavior and works remarkably well in nature. It belongs to the family of swarm intelligence algorithms, designed for decentralized decision‑making processes whereby “bee agents” pursue their individual goals autonomously, while collectively improving the standard of the general solution (the “honeypot”).
This popular technique has been widely applied to many , specifically: scheduling, routing, energy optimization, resource allocation and anomaly detection. Researchers often mix ABC with neural networks in a hybrid approach, for instance, using ABC to tune hyperparameters or optimize model weights. The algorithm is especially relevant when data is scarce or when the issue is combinatorial – when the answer space grows exponentially (and even factorially) with the variety of features.
On this project, my approach has been to mimic Swarm Optimization for an Adaptive Grid Search. The creative twist is that I applied Google’s latest Agentic AI tools to implement the bee agents. Within the ABC algorithm, there are three sorts of autonomous bee agents, and I defined their roles using text prompts powered by the newest Gemini LLMs.
Each foraging cycle (algorithm iteration) proceeds as follows:
- Scout bees explore → discover latest food sources (candidate solutions).
- Employed bees exploit → refine those sources and dance to share information in regards to the quality of the nectar (fitness function).
- Onlooker bees exploit further → guided by the dances, they reinforce the colony’s deal with the most effective food sources.
🔍Example ABC Agent Search Progress
⏳Agent Lifecycle in Swarm Optimization
The ABC algorithm was first proposed by Derviş Karaboğa in 2005. In my modernized meta‑heuristic adaptation, I focused on the goal of improving clustering performance for an unsupervised dataset.
Below are the Python classes I implemented:
- WebResearcher: Answerable for researching and summarizing scikit-learn clustering algorithms and their key hyperparameters. The knowledge gathered is crucial for generating accurate and effective prompts for the bee agents, and this class is implemented as an LLM‑based agent.
- ScoutBeeAgent: Generates diverse initial candidate clustering solutions for the Iris dataset, leveraging the parameter summaries provided by the WebResearcher.
- EmployedBeeAgent: Refines existing candidate solutions by exploring local parameter neighborhoods, using the WebResearcher’s insights to make informed adjustments.
- OnlookerBeeAgent: Evaluates the generated and refined candidates, choosing essentially the most promising ones to hold forward to the following iteration.
- Runner: Orchestrates the general ABC optimization loop, organizing and coordinating the Gemini AI agent flow. It manages sequencing between the several bee agents and tracks global progress. While the Runner ensures structure and oversight, each bee agent operates in a totally distributed and autonomous manner, independently performing its specialized tasks without centralized control.
- FitnessModel: Evaluates the standard of every candidate solution using the Adjusted Rand Index (ARI), with the target of minimizing 1 – ARI to realize higher clustering solutions.
- Reporter: Visualizes the convergence of the most effective ARI values over iterations and compares the highest‑performing solutions against baseline clustering models.
🐝The three Bee Agent Roles
The agents determine parameter values and ranges through natural language prompts provided to the Gemini generative AI model. All three agents inherit from the BeeAgent base class, which handles shared setup and candidate tracking. A part of each prompt is informed by the WebResearcher, which summarizes scikit-learn clustering algorithms algorithms and their key hyperparameters to make sure accuracy and relevance. Here’s how each agent works:
- 🐝ScoutBeeAgent (Initial Parameter Generation): Constructs prompts that allow the LLM some creativity inside defined constraints. The allowed_algorithms parameter guides which models to contemplate from the favored clustering algorithms in scikit‑learn. The Gemini model interprets these instructions and generates diverse candidate solutions, ensuring no duplicates and balanced distribution across algorithms.
- 🐝EmployedBeeAgent (Parameter Refinement): Generates prompts with refining instructions, directing the LLM to regulate parameters by roughly ±10–20%, remain inside valid ranges, and avoid inventing unsupported parameters. It takes the present solutions and applies these rules to create barely varied (refined) candidates throughout the local neighborhood of the prevailing parameter space.
- 🐝OnlookerBeeAgent (Evaluation and Selection): Produces prompts that evaluate the candidates generated and refined by the opposite agents. Using a fitness rating based on the Adjusted Rand Index (ARI), it selects the highest‑k promising solutions, maintains algorithm diversity, and avoids duplicates. This reinforces the colony’s deal with the strongest candidates.
In essence, the Python code defines the duty goal, parameters, constraints, and return values as text throughout the prompts. The generative AI model (Gemini) then “reads” and “understands” these instructions to provide or modify the actual numerical and categorical parameter values for the clustering algorithms. Different LLMs may respond in another way to subtle changes within the input text, so it will be significant to experiment with the wording of prompts for the three agent classes. To refine the wording further, you’ll be able to at all times seek the advice of your selected LLM.
🪻Iris Dataset
A alternative for this study is Sir Ronald Fisher’s classic Iris flower dataset, introduced in his 1936 paper. In the following sections, this dataset is utilized as a small, well‑defined demonstration case as an example how the proposed ABC optimization method may be applied throughout the context of a clustering problem.
The Iris dataset (License : CC0 1.0) comprises 150 labeled samples, each belonging to certainly one of 3 Iris classes: Iris Setosa, Iris Versicolor, Iris Virginica. Each flower sample is related to 4 numeric features:



As shown in each the pairwise relationship plots and the mutual information feature‑importance plots, and are by far essentially the most informative features when measured against the goal labels of the Iris dataset.
Mutual Information (MI) is computed feature‑smart with respect to the labels, whereas the Adjusted Rand Index (ARI), utilized in this project for fitness evaluation, measures the agreement between two partitions (predicted cluster labels versus true labels). Note that even when feature selection is applied, since Iris Versicolor and Iris Virginica share similar petal lengths and widths, their clusters overlap in feature space. Consequently, the ARI may be strong but cannot reach an ideal rating of 1.0.
❄ Clustering – No labels? No problem!
Clustering algorithms are a cornerstone of unsupervised learning and so I selected to deal with the goal of blindly determining the flower classes based solely on their features. In other words, the model was not trained on the flower labels; those labels were used only to validate performance metrics. Traditional clustering algorithms comparable to KMeans or DBSCAN often struggle with parameter sensitivity and dataset variability. Subsequently, a meta-heuristic like ABC, which balances exploration vs exploitation, appears promising.
Note that in clustering algorithms, parameters should technically be known as hyperparameters, because they’re not learned from the information during training (as weights in a neural network or regression coefficients are) but they’re set externally. Nevertheless, for brevity, they’re also known as parameters.
Here’s a concise visual comparison of various clustering algorithms applied to several toy datasets, different colours represent different clusters that every algorithm found for 2D representations:

Within the classic Iris dataset, the 2 most similar species — and — often pose a challenge for clustering algorithms. Many methods mistakenly group them right into a single cluster, treating them as one continuous dense region. In contrast, the more distinct species is consistently identified as a separate cluster.
Table comparing several popular clustering algorithms available within the scikit‑learn library:
| Algorithm | Summary | Key Hyperparameters | Efficiency | Accuracy |
| KMeans | Centroid-based, partitions data into k spherical clusters; easy and fast. | n_clusters, init, n_init, max_iter, random_state, tol | Fast on medium–large datasets; scales well; advantages from multiple restarts. | Strong for well-separated, convex clusters; poor on non-convex or varying-density shapes. |
| DBSCAN | Density-based, finds arbitrarily shaped clusters and marks noise with no need k. | eps, min_samples, metric, leaf_size | Moderate; slower in high dimensions; efficient with spatial indexing. | Excellent for irregular shapes and noise; sensitive to eps and density differences. |
| Agglomerative (Hierarchical) | Builds a dendrogram by iteratively merging clusters; no fixed k until cut. | n_clusters, affinity, linkage, distance_threshold | Slower (often O(n²)); memory-heavy for giant n. | Good structural discovery; linkage alternative impacts results; handles non-spherical clusters. |
| Gaussian Mixture Models (GMM) | Probabilistic mixture of Gaussians using EM (Expectation Maximization); soft assignments. | n_components, covariance_type, tol, max_iter, n_init, random_state | Moderate; EM may be costly with full covariance. | High when data is near-Gaussian; flexible shapes; risk of overfitting without constraints. |
| Spectral clustering | Graph-based; embeds data via eigenvectors before clustering (often KMeans). | n_clusters, assign_labels, n_neighbors, random_state, affinity | Slow on large n resulting from eigen-decomposition; best for small–medium sets. | Strong for manifold/complex structures; quality hinges on graph construction and affinity. |
| MeanShift | Mode-seeking via kernel density; no have to predefine k. | bandwidth, cluster_all, max_iter, n_jobs | Slow; expensive with many points/features. | Good for locating cluster modes; performance highly depending on bandwidth alternative. |
K‑Means as a Basic Clustering Example
K‑Means is amongst essentially the most widely used clustering algorithms, valued for its simplicity and efficiency. Due to its prevalence, I’ll outline it here in additional detail as a representative example of how clustering is usually performed. Its popularity comes from its simplicity and efficiency, though it does have limitations. A key drawback is that the variety of clusters k should be specified upfront.
How K‑Means Works
- Initialize Centroids:
Select k starting centroids, either randomly or with smarter strategies like K‑Means++, which spreads them out to enhance clustering quality. - Assign Points to Clusters:
Represent each data point as an n-dimensional vector, where each component corresponds to 1 feature. Assign points to the closest centroid using a distance metric (commonly Euclidean). In high‑dimensional spaces, this step is complicated by the Curse of Dimensionality, where distances lose discriminative power. - Update Centroids & Repeat:
Recompute each centroid because the mean of all points in its cluster, then reassign points to the closest centroid. Repeat until assignments stabilize — that is convergence.
Practical Considerations
- Curse of Dimensionality: In very high dimensions, distance metrics develop into less effective, reducing clustering reliability.
- Dimensionality Reduction: Techniques like PCA or t‑SNE are sometimes applied before K‑Means to simplify the feature space and improve results.
- Selecting K: Methods comparable to the Elbow Method, Silhouette Rating, or meta‑heuristics (e.g., ABC optimization) help estimate the optimal variety of clusters.
🏋️Fitness Model for Clustering
The FitnessModel evaluates clustering candidate solutions on a dataset. The goal of clustering algorithm is to provide clusters that ideally map closely to the true classes but often it’s not an ideal match. ARI (Adjusted Rand Index) is used to measure the similarity between two clusterings (predicted vs. ground truth) – it’s a widely used metric for evaluating clustering performance since it corrects for likelihood agreement, works across different clustering algorithms, and provides a transparent scale from −1 to +1 that’s easy to interpret.
| ARI Range | Meaning | Typical Edge Case Scenario |
| +1.0 | Perfect agreement | Predicted clustering exactly matches ground truth labels |
| ≈ 0.0 | Random clustering (likelihood level) | – Assignments are random- All points forced into one cluster (unless ground truth can be one cluster) |
| < 0.0 | Worse than random | – Systematic disagreement (clusters consistently mismatched or flipped)- Each point its own cluster when ground truth is different |
| Low/Negative (near −1) | Strong disagreement | Extreme imbalance or mislabeling across clusters |
Fitness = 1 – ARI, so lower fitness is best. This permits ABC to directly optimize clustering quality. Shown below is an example run for the initial iterations of an ABC with Gemini Agents that I developed including a preview of the LLM raw response texts. Note how the GMM (Gaussian Mixture Models) steadily improves as latest candidates are chosen on each iteration by the several bee agents. Confer with the Google Colab notebook for the logs for more iterations.
Starting ABC run with Fitness Model for dataset: Iris
Features: 4, Classes: 3
Baseline Models (ARI): {'DBSCAN': 0.6309344087637648, 'KMeans': 0.6201351808870379, 'Agglomerative': 0.6153229932145449, 'GMM': 0.5164585360868599, 'Spectral': 0.6451422031981431, 'MeanShift': 0.5681159420289855}
Runner: Initiating Scout Agent for initial solutions...
Scout Generating initial candidate solutions...
Scout : Sending prompt to Gemini model... n_candidates=12
Scout : Received response from Gemini model.
Scout : Raw response text: ```json[{"model":"KMeans","params":{"n_clusters":3,"init":"k-means++","n_init":10,"random_state":42}},{"model":"KMeans","params":{"n_clusters":4,"init":"random","n_init":10,"random_state":42}},{"model":"KMeans","params":{"n_clusters":5,"init":"k-mean...
Scout : Initial candidates generated.
Runner: Scout Agent returned 12 initial solutions.
Runner: Starting iteration 1/8...
Runner: Agents completed actions for iteration 1.
--- Iteration 1 Details ---
GMM Candidate 1 (Origin: Scout-10010) : Best previous ARI=0.820, Current ARI=0.820, Params: {'n_components': 4, 'covariance_type': 'tied', 'max_iter': 100, 'random_state': 42}
KMeans Candidate 2 (Origin: Scout-10000): Best previous ARI=0.620, Current ARI=0.620, Params: {'n_clusters': 3, 'init': 'k-means++', 'n_init': 10, 'random_state': 42}
DBSCAN Candidate 3 (Origin: Scout-10004): Best previous ARI=0.550, Current ARI=0.550, Params: {'eps': 0.7, 'min_samples': 4}
GMM Candidate 4 (Origin: Scout-10009) : Best previous ARI=0.820, Current ARI=0.516, Params: {'n_components': 3, 'covariance_type': 'full', 'max_iter': 100, 'random_state': 42}
KMeans Candidate 5 (Origin: Scout-10001): Best previous ARI=0.620, Current ARI=0.462, Params: {'n_clusters': 4, 'init': 'random', 'n_init': 10, 'random_state': 42}
DBSCAN Candidate 6 (Origin: Scout-10003): Best previous ARI=0.550, Current ARI=0.442, Params: {'eps': 0.5, 'min_samples': 5}
KMeans Candidate 7 (Origin: Scout-10002): Best previous ARI=0.620, Current ARI=0.435, Params: {'n_clusters': 5, 'init': 'k-means++', 'n_init': 5, 'random_state': 42}
DBSCAN Candidate 8 (Origin: Scout-10005): Best previous ARI=0.550, Current ARI=0.234, Params: {'eps': 0.4, 'min_samples': 6}
*** Global Best so far: ARI=0.820, Candidate={'model': 'GMM', 'params': {'n_components': 4, 'covariance_type': 'tied', 'max_iter': 100, 'random_state': 42}, 'origin_agent': 'Scout-10010', 'current_ari_for_display': 0.8202989638185834}
-----------------------------
Runner: Starting iteration 2/8...
Scout Generating initial candidate solutions...
Scout : Sending prompt to Gemini model... n_candidates=12
Employed Refining current solutions...
Employed : Sending prompt to Gemini model... n_variants=12
Onlooker Evaluating candidates and selecting promising ones...
Onlooker : Sending prompt to Gemini model... top_k=5
Scout : Received response from Gemini model.
Scout : Raw response text: ```json[{"model":"KMeans","params":{"n_clusters":3,"init":"k-means++","n_init":10,"random_state":42}},{"model":"KMeans","params":{"n_clusters":4,"init":"random","n_init":10,"random_state":42}},{"model":"KMeans","params":{"n_clusters":5,"init":"k-mean...
Scout : Initial candidates generated.
Employed : Received response from Gemini model.
Employed : Raw response text: ```json[{"model":"GMM","params":{"n_components":5,"covariance_type":"tied","max_iter":100,"random_state":42}},{"model":"GMM","params":{"n_components":3,"covariance_type":"full","max_iter":100,"random_state":42}},{"model":"KMeans","params":{"n_cluster...
Employed : Solutions refined.
Onlooker : Received response from Gemini model.
Onlooker : Raw response text: ```json[{"model":"GMM","params":{"n_components":4,"covariance_type":"tied","max_iter":100,"random_state":42}},{"model":"KMeans","params":{"n_clusters":3,"init":"k-means++","n_init":10,"random_state":42}},{"model":"DBSCAN","params":{"eps":0.7,"min_sam...
Onlooker : Promising candidates selected.
Runner: Agents completed actions for iteration 2.
--- Iteration 2 Details ---
GMM Candidate 1 (Origin: Scout-10022) : Best previous ARI=0.820, Current ARI=0.820, Params: {'n_components': 4, 'covariance_type': 'tied', 'max_iter': 100, 'random_state': 42}
GMM Candidate 2 (Origin: Scout-10010) : Best previous ARI=0.820, Current ARI=0.820, Params: {'n_components': 4, 'covariance_type': 'tied', 'max_iter': 100, 'random_state': 42}
GMM Candidate 3 (Origin: Onlooker-30000): Best previous ARI=0.820, Current ARI=0.820, Params: {'n_components': 4, 'covariance_type': 'tied', 'max_iter': 100, 'random_state': 42}
GMM Candidate 4 (Origin: Employed-20007): Best previous ARI=0.820, Current ARI=0.820, Params: {'n_components': 4, 'covariance_type': 'tied', 'max_iter': 80, 'random_state': 42}
GMM Candidate 5 (Origin: Employed-20006): Best previous ARI=0.820, Current ARI=0.820, Params: {'n_components': 4, 'covariance_type': 'tied', 'max_iter': 120, 'random_state': 42}
GMM Candidate 6 (Origin: Employed-20000): Best previous ARI=0.820, Current ARI=0.693, Params: {'n_components': 5, 'covariance_type': 'tied', 'max_iter': 100, 'random_state': 42}
KMeans Candidate 7 (Origin: Scout-10012): Best previous ARI=0.620, Current ARI=0.620, Params: {'n_clusters': 3, 'init': 'k-means++', 'n_init': 10, 'random_state': 42}
KMeans Candidate 8 (Origin: Scout-10000): Best previous ARI=0.620, Current ARI=0.620, Params: {'n_clusters': 3, 'init': 'k-means++', 'n_init': 10, 'random_state': 42}
*** Global Best so far: ARI=0.820, Candidate={'model': 'GMM', 'params': {'n_components': 4, 'covariance_type': 'tied', 'max_iter': 100, 'random_state': 42}, 'origin_agent': 'Scout-10010', 'current_ari_for_display': 0.8202989638185834}

While the Adjusted Rand Index (ARI) provides a single score for clustering quality, the Confusion Matrix reveals misclassifications occur by showing how true classes are distributed across predicted clusters.
In the Iris dataset, scikit‑learn encodes the species in a fixed order:
0 = Setosa, 1 = Versicolor, 2 = Virginica.
Even though there are only three true species, the algorithm below mistakenly produced four clusters. The matrix illustrates this mismatch:
[[ 0 6 44 0]
[ 2 0 0 48]
[49 0 0 1 ]
[ 0 0 0 0 ]]
⚠️ Note: The order of the columns (clusters) doesn’t necessarily correspond to the order of the rows (true classes). Cluster IDs are arbitrary labels assigned by the algorithm, they usually don’t carry any inherent meaning.
Row-by-row Interpretation (row and column IDs start from 0)
- Row 0: [ 0 6 44 0]
Setosa class → Its samples fall only into columns 1 and 2, with no overlap with Versicolor or Virginica. These two columns should really have been recognized as a single cluster corresponding to Setosa. - Row 1: [ 2 0 0 48]
Versicolor class → Split between columns 0 and 3, showing that the algorithm didn’t isolate Versicolor cleanly. - Row 2: [49 0 0 1]
Virginica class → Also split between columns 0 and 3, overlapping with Versicolor fairly than forming its own distinct cluster. - Row 3: [ 0 0 0 0]
Extra mistaken cluster → No true samples here, reflecting that the algorithm produced 4 clusters for a dataset with only 3 classes.
📌The confusion matrix shows that Setosa is distinct (its clusters don’t overlap with the opposite species), while Versicolor and Virginica aren’t separated cleanly – each are spread across the identical two clusters (columns 0 and 3). This overlap highlights the algorithm’s difficulty in distinguishing between them. The confusion matrix makes these misclassifications visible in a way that a single ARI rating cannot.
🏃Running the Agentic AI Loop
The Runner orchestrates iterations:
- Scout bees propose diverse solutions.
- Employed bees refine them.
- Onlooker bees select promising ones.
- The answer pool is updated.
- The very best ARI per iteration is tracked.
Within the Runner class and throughout the Artificial Bee Colony (ABC) algorithm, a refers to a selected clustering model along with its defined parameters. In the instance in the answer pool shown below, two candidates are returned.
Candidates are orchestrated using python’s concurrent.futures.ThreadPoolExecutor, which enables parallel execution. Consequently, the ScoutAgent, EmployedBeeAgent, and OnlookerBeeAgent are run asynchronously in separate threads during each iteration of the algorithm.
The runner.run() method returns two objects:
: This can be a list of the pool_size most promising candidates (each being a dictionary containing a model and its parameters) found across all iterations. This list is sorted by fitness (ARI), so the very first element, solution_pool[0], will represent the best-fitting model and its specific parameters that the ABC algorithm discovered.
: This can be a list that tracks only the most effective Adjusted Rand Index.
For instance:
solution_pool = [
{
"model": "KMeans",
"params": {"n_clusters": 3, "init": "k-means++"},
"origin_agent": "Employed",
"current_ari_for_display": 0.742
},
{
"model": "AgglomerativeClustering",
"params": {"n_clusters": 3, "linkage": "ward"},
"origin_agent": "Onlooker",
"current_ari_for_display": 0.715
}
]
best_history = [
{"ari": 0.642, "model": "KMeans", "params": {"n_clusters": 3, "init": "random"}},
{"ari": 0.742, "model": "KMeans", "params": {"n_clusters": 3, "init": "k-means++"}}
]
Solution Pool Setup with ThreadPoolExecutor
ThreadPoolExecutor(): Initializes a pool of employee threads that may execute tasks concurrently.
ex.submit(…): Submits each agent’s act method as a separate task to the thread pool.
from concurrent.futures import ThreadPoolExecutor
import copy
# ... inside Runner.run() ...
for it in range(iterations):
print(f"Runner: Starting iteration {it+1}/{iterations}...")
if it == 0:
results = []
else:
# Use threads as a substitute of processes
with ThreadPoolExecutor() as ex:
futures = [
ex.submit(self.scout.act),
ex.submit(self.employed.act, solution_pool),
ex.submit(self.onlooker.act, solution_pool)
]
results = [f.result() for f in futures]
print(f"Runner: Agents accomplished actions for iteration {it+1}.")
# ... remainder of the loop unchanged ...
Each agent’s act method is dispatched to the thread pool, allowing them to run in parallel. The decision to f.result() ensures that the Runner waits for all tasks to complete before moving forward.
This design achieves two things:
- Parallel execution inside an iteration — agents act concurrently, mimicking real bee colony behavior.
- Sequential iteration control — the Runner only advances once all agents have accomplished their work, keeping the general loop orderly and deterministic.
From the Runner’s perspective, iterations still appear sequential, but internally each iteration advantages from concurrent execution of agent tasks.
Solution Pool Setup with ProcessPoolExecutor
While ThreadPoolExecutor provides concurrency through threads, it may possibly be seamlessly replaced with ProcessPoolExecutor to realize true parallel CPU execution.
With ProcessPoolExecutor, each agent runs in its own separate process, which bypasses Python’s GIL (Global Interpreter Lock). The GIL is a mutex (mutual exclusion lock) that ensures just one thread executes Python bytecode at a time, even on multi‑core systems. Through the use of processes as a substitute of threads, heavy numerical workloads can fully leverage multiple CPU cores, enabling real parallelism and improved performance for compute‑intensive tasks.
from concurrent.futures import ProcessPoolExecutor
import copy
# ... inside Runner.run() ...
for it in range(iterations):
print(f"Runner: Starting iteration {it+1}/{iterations}...")
if it == 0:
results = []
else:
# Use processes as a substitute of threads
with ProcessPoolExecutor() as ex:
futures = [
ex.submit(self.scout.act),
ex.submit(self.employed.act, solution_pool),
ex.submit(self.onlooker.act, solution_pool)
]
results = [f.result() for f in futures]
print(f"Runner: Agents accomplished actions for iteration {it+1}.")
# ... remainder of the loop unchanged ...
Key Differences between ProcessPoolExecutor vs ThreadPoolExecutor
- ProcessPoolExecutor launches separate python processes, not threads.
- Each agent runs independently on a distinct CPU core.
- This avoids the GIL, so CPU‑sure tasks (like clustering, fitness evaluation, numerical optimization) truly run in parallel. A CPU‑sure task is any computation where the limiting factor is the processor’s speed fairly than waiting for input/output (I/O).
- Since processes run in separate memory spaces, they’ll’t directly share objects. As an alternative, anything passed between them should be serialized (pickled). Easy python objects like dictionaries, lists, strings, and numbers are picklable, so candidate dictionaries may be exchanged safely.
📌Key Takeaway:
✅ Use ProcessPoolExecutor in case your agents do heavy computation (matrix ops, clustering, ML training).
❌ Stick to ThreadPoolExecutor in case your agents are mostly I/O‑sure (waiting for data, network, disk).
Why are a number of the candidate parameter values repeated in several iterations?
The repetition of candidate parameter values across iterations is a natural final result of how the Artificial Bee Colony algorithm works and the way the agents interact:
Scout Bee Agent’s Exploration: The ScoutBeeAgent is tasked with generating latest and diverse candidate solutions. While it goals for diversity, given a limited parameter space or if the generative model finds certain parameter mixtures consistently effective, it would suggest similar solutions in several iterations.
Employed Bee Agent’s Exploitation: The EmployedBeeAgent refines existing promising solutions. If an answer is already superb or near an optimal configuration, the “local neighborhood” exploration (e.g., adjusting parameters by ±10-20%) might lead back to the identical or very similar parameter values, especially after rounding or if the parameter adjustments are small.
Onlooker Bee Agent’s Selection: The OnlookerBeeAgent selects the top_k most promising solutions from a bigger set of candidates (which incorporates newly scouted, refined by employed, and previously promising solutions). If the algorithm is converging, or if several distinct solutions yield very similar high-fitness scores, the OnlookerBeeAgent might repeatedly select parameter sets which are effectively equivalent from one iteration to the following.
Solution Pool Management: The Runner maintains a solution_pool of a set pool_size. It sorts this pool by fitness and keeps the most effective ones. If the highest solutions remain consistently the identical, or if latest good solutions are equivalent to previous ones, those parameter sets will persist and thus be “repeated” within the iteration details.
Convergence: Because the ABC algorithm progresses, it’s expected to converge towards optimal or near-optimal solutions. This convergence often signifies that the search space narrows, and agents repeatedly find the identical high-performing parameter configurations unless some form of pruning method (like deduplication) is applied.
📊Reporting Results
Benchmarking Standard Clustering Algorithms
Before applying ABC, it is helpful to ascertain a baseline by evaluating the performance of ordinary clustering methods. I ran a comparison benchmark using default configurations for the next algorithms:
- KMeans
- DBSCAN
- Agglomerative Clustering
- Gaussian Mixture Models (GMM)
- Spectral Clustering
- MeanShift
As shown within the Google Colab notebook, the ABC agents discovered parameter sets that significantly improved the Adjusted Rand Index (ARI), reducing misclassifications between the closely related classes and .
Reporter Outputs
The Reporter class is chargeable for generating final evaluation outputs after running the Artificial Bee Colony (ABC) optimization. It provides three primary functions:
- Comparison Table
- Compares each candidate solution’s Adjusted Rand Index (ARI) against baseline clustering models.
- Reports the development (candidate_ari – baseline_ari).
- Confusion Matrix Display
- Prints the confusion matrix of the most effective candidate solution to indicate class-level performance and misclassifications.
- Convergence Visualization
- Plots the progression of the most effective ARI across iterations.
- Annotates the plot with model names and parameters for every iteration.
💬Designing Agent Prompts for Gemini
I made a decision to design each agent’s prompt with the next template for a structured approach:
• Task Goal: What the agent must achieve.
• Parameters: Inputs like dataset name, variety of candidates for the agent type, allowed algorithms and the hyperparameter input dictionary returned by the WebResearcher via its LLM prompt.
• Constraints: Ensure each candidate is exclusive, maintain balanced distribution across algorithms, require hyperparameters to remain inside valid ranges.
• Return Values: JSON list of candidate solutions.
To make sure deterministic LLM behavior, I used this generation_config. Specifically, note that specifying a temperature of zero leaves the model with no room for creativity between prompts and easily repeats the previous response.
generation_config={
"temperature": 0.0,
"top_p": 1.0,
"top_k": 1,
"max_output_tokens": 4096
}
res = genai_model.generate_content(prompt, generation_config=generation_config)
While developing latest code like on this project, it will be significant to be certain that for a similar input, you get the identical output.
⚠️Gemini Agentic AI Issues
Gemini AI Model Types
- Lite (Flash‑Lite): Prioritize speed and price efficiency. Ideal for bulk tasks like translation or classification.
- Flash: Well‑fitted to production workloads requiring scale and moderate reasoning.
- Pro: The flagship tier – best for complex reasoning, multimodal comprehension (text, images, audio, video), and agentic AI use cases.
Why Prompts Alone Fail in Lite Models
I bumped into a typical limitation for the “Lite” models:
LLMs don’t reliably obey instructions like simply because you place them within the prompt. As of today, models often revert to defaults or minimal sets unless structure is enforced after generation. Why the express prompt still failed:
- Natural language instructions are weak constraints. Even “at all times include exactly these parameters” is interpreted probabilistically.
- No schema enforcement. When parsing JSON, it is advisable validate that required keys exist.
- Deduplication addresses duplicates, not gaps. It eliminates equivalent candidates but doesn’t restore missing parameters.
📌Key Takeaway: Prompts alone won’t guarantee compliance. You wish prompt + schema enforcement to make sure outputs consistently include required parameters.
Prompt Compliance Issues and Schema Solutions
Models can prioritize other parts of the prompt or simplify outputs despite emphasis on required items.
- Example instruction: “Return Values: ONLY output a JSON-style dictionary. Return string should be now not than 1024 characters.”
- Observed final result: len(res_text) = 1036 – responses exceeded the limit.
- Missing fields: Required items sometimes didn’t appear, even when stated clearly. Providing concrete output examples improved adherence.
- Practical fix: Pair prompts with schema enforcement (e.g., validate required keys, length checks) and post‑generation normalization to ensure structure.
Empty Candidate Errors in Gemini API
Once in a while, I got this response:
That error message means the model didn’t actually return any usable content in its response, so when my code tried to access response.text, there was no valid “Part” to read. The important thing clue is finish_reason = 2, which in Google’s API corresponds to a STOP or no content generated condition (the model terminated without producing text).
Why it happens:
- Empty candidate: The API call succeeded, however the model produced no output.
- FinishReason = 2: Indicates the generation stopped before yielding a legitimate part.
- Quick accessor failure: Since response.text expects no less than one valid text part, it throws an error when none exist.
Learn how to handle it:
- Check finish_reason before accessing response.text. Only read text if the candidate includes a legitimate part.
- Add fallback logic: If no text is returned, log the finish reason and retry or handle gracefully.
- Schema enforcement: Validate that required fields exist within the response before parsing.
📌 Key Takeaway: This isn’t a network error — it’s the model signaling that it stopped without generating text. You could find the total list of FinishReason values and guidance on interpreting them in Google’s documentation: Generate Content API – FinishReason.
Intermittent API Connection Errors
Once in a while, the Gemini API call failed with:
- Error: ConnectionError: (‘Connection aborted.’, RemoteDisconnected(‘Distant end closed connection without response’))
📌 Key Takeaway: This can be a network error and occurred without code changes, indicating transient network or service issues. Add retries with exponential backoff, timeouts, and robust logging to capture context (request size, rate limits, finish_reason) and get better gracefully.
Agent Security Considerations
Another thing to listen to, especially if you happen to are using Agents for corporate use – security is mission-critical!
⚠️Provide strict guardrails between Agents and the LLM. Actively prevent agents from deleting critical files, taking off‑topic actions, making unauthorized external API calls, etc.
📌 Key takeaway: Apply the Principle of Least Privilege
- Scope: Restrict each agent’s permissions strictly to its assigned task.
- Isolation: Block filesystem writes, external calls, or off‑topic actions unless explicitly authorized.
- Audit: Record all actions and require approvals for sensitive operations.
⚔️Agentic AI Competitive Landscape towards 2026
Model Providers
This table outlines how the agentic AI market is predicted to develop within the near future. It highlights the primary firms, emerging competitors, and the trends that can shape the space as we move towards 2026. Presented here as a non‑exhaustive list of direct competitors to Gemini, the aim is to present readers a transparent picture of the strategic environment by which agentic AI is evolving.
| Provider | Core Focus | Strengths | Notes |
| Google Gemini API | Multimodal LLM service (text, vision, code, etc.) | High‑quality generative outputs; Google Cloud integration; strong multimodal capabilities | Primarily a model API, Gemini 3 explicitly designed to support orchestration of agentic workflows |
| OpenAI GPT APIs | Text + code generation | Widely adopted; strong ecosystem; superb‑tuning options | Limited multimodal support in comparison with Gemini |
| Anthropic Claude | Safety‑focused text LLMs | Strong alignment and safety features; long context handling | Less multimodal capability |
| Mistral AI | Open and enterprise models | Flexible deployment; community driven; customizable | Requires infrastructure setup |
| Meta LLaMA | Open‑weight research models | Open source; strong research backing; customizable | Needs infra and ops for production |
| Cohere | Enterprise NLP and embeddings | Enterprise features; embeddings; privacy options | Narrower scope than general LLMs |
Agent Orchestration Frameworks
This table examines the management and orchestration points of agentic AI. It highlights how different frameworks handle coordination, reliability, and integration to enable scalable agent systems.
| Framework | Core Focus | Strengths | Notes |
| LangGraph | Graph‑based orchestration | Models workflows as nodes/edges; strong memory; multi‑agent collaboration | Requires developer setup; orchestration only |
| LangChain | Agent/workflow orchestration | Wealthy ecosystem; tool integration; memory/state handling | Can increase token usage and complexity |
| CrewAI | Role‑based crew orchestration | Role specialization; collaboration patterns; good for teamwork scenarios | Is dependent upon external LLMs |
| OpenAI Swarm | Lightweight multi‑agent orchestration | Easy handoffs; ergonomic routines | Good for running experiments |
| AutoGen (Microsoft) | Multi‑agent framework | Research + production focus; extensible | Still evolving; requires Microsoft ecosystem |
| AutoGPT | Autonomous agent prototype | Fast prototyping; community driven | Various production readiness |
✨Conclusion and Future Work
This project was my first experiment with Gemini’s agentic AI, adapting the Artificial Bee Colony algorithm to an optimization task. Even on a small dataset, it demonstrated how LLMs can tackle bee‑like roles in a meta‑heuristic process, while also revealing each the promise and the sensible challenges of this approach. Be at liberty to repeat and adapt the Google Colab notebook for your personal projects.
Future Work
- Applying the ABC meta‑heuristic to larger and more diverse datasets.
- Extending the WebResearcher agent to robotically construct datasets from domain‑specific sources (e.g. Royal Botanic Gardens Kew – POWO), inspired by Sir Ronald Fisher’s pioneering work in statistical botany.
- Running experiments with expanded pools of employee threads and adjusting the variety of candidates per bee agent type.
- Exploring semi‑supervised clustering, where a small labeled dataset complements a bigger unlabeled one.
- Comparing results from Google’s Gemini API with outputs from other providers’ APIs.
