Building BugInsights: How We Won a 30-Team Hackathon with HDBSCAN

September 5, 20257 min read

Cisco engineers file a lot of bugs. Over years, those reports accumulate into a database that contains genuine signal — recurring failure modes, systemic architectural problems, components that keep breaking — but finding that signal manually is impossible. BugInsights was our attempt to automate it.

The Problem

Engineering managers want to know: which components keep failing? Are there classes of bugs we keep re-filing? Is there a subsystem that’s disproportionately unstable? The answers exist in the bug database, but they’re buried under inconsistent formatting, vague titles, and years of accumulated noise.

Classic keyword search doesn’t work. "Interface flap" and "link going down" describe the same thing. "Memory leak in telemetry" and "TMC process OOM" might be the same bug filed twice by different engineers using different vocabulary.

The Stack

We built a pipeline:

Step 1 — Summarization. Raw bug reports are messy. We used GPT-4o to clean and summarize each report into a structured 2-3 sentence description: what failed, what component, what impact. This normalization was critical for the next step.

Step 2 — Embedding. Each summary was embedded using OpenAI’s text-embedding model and stored in ChromaDB. ChromaDB gave us fast approximate nearest-neighbor search at hackathon speed.

Step 3 — Clustering. We ran HDBSCAN (Hierarchical Density-Based Spatial Clustering of Applications with Noise) over the embedding vectors. HDBSCAN was the right choice for several reasons: it doesn’t require you to specify the number of clusters upfront, it handles noise (bugs that don’t belong to any cluster) gracefully, and it identifies clusters of varying density, which matches the real distribution of bug patterns.

Step 4 — Labeling. For each cluster, we fed the top-5 representative bug summaries back to GPT-4o and asked it to generate a concise category label. "Interface driver crashes under high packet rate" is infinitely more useful than "Cluster 7."

Step 5 — Dashboard. A Streamlit app with filterable charts by component, time period, and severity. Engineering managers could drill into clusters, see the constituent bugs, and export for postmortems.

Why Not K-Means?

K-means requires you to specify K. We had no idea how many bug categories existed — that was the whole point. K-means also forces every point into a cluster, which means noise bugs get incorrectly assigned. HDBSCAN marks outlier bugs as noise (label -1) and only clusters the ones that genuinely belong together.

What We Learned

The embeddings did the heavy lifting. Once bugs are in a good semantic embedding space, the clustering is almost mechanical. The harder problem was prompt engineering the summarization step — getting GPT-4o to extract the right signal from wildly inconsistent bug report formats.

We won against 30+ teams. The judges said the most impressive thing was that the tool was immediately usable — no tuning, no training data, no labeling effort. That’s the real promise of LLM-powered tooling: you get surprisingly far on a well-designed pipeline around a capable foundation model.