Lorem, ipsum dolor sit amet consectetur adipisicing elit. Qui, itaque voluptate ipsa non enim amet ducimus voluptatibus deserunt nam esse!
Building a Chatbot That Answers Your Infrastructure Questions

Building a Chatbot That Answers Your Infrastructure Questions

pr0h0
chatbotinfrastructureaiautomation
AI Usage (86%)

Building a chatbot for infrastructure questions sounds straightforward until you decide what counts as an answer. If it can name a service, quote a runbook, and point to the place where a change gets verified, it is useful. If it starts guessing about prod topology, it becomes a liability quickly.

What this chatbot should actually answer

I would keep the first version narrow. Good questions are the ones that already have answers in internal material:

  • “Which cluster runs the billing API?”
  • “What is the rollback step for the queue worker?”
  • “Where is the Redis connection string documented?”
  • “What changed in the last deploy window?”

Bad questions are open-ended or need live operational judgment:

  • “Is the system healthy?”
  • “Should I restart this service?”
  • “What is wrong with customer 1432's request?”

That split matters because the bot is not an operator. It is a retrieval layer over your system knowledge.

Pick the data sources before you pick the model

Start with docs, runbooks, and config snapshots

I usually start with sources that are stable, textual, and already reviewed by humans:

  • architecture docs
  • incident runbooks
  • service ownership pages
  • sanitized config snapshots
  • deploy notes
  • Terraform or Kubernetes manifests, if you treat them as read-only evidence

The model is the easy part. The quality comes from source selection and update cadence. If the docs are stale, the bot will be confidently stale too.

Define what stays out of scope

You should explicitly exclude:

  • secrets and credentials
  • private keys
  • ephemeral access tokens
  • raw customer data
  • interactive shell output from production hosts
  • anything that can trigger action without approval
⚠️

Do not let the chatbot retrieve secrets “for convenience.” If it can see them, assume they will eventually leak into logs, prompts, or user-visible answers.

Architecture that keeps answers grounded

Retrieval layer and indexing strategy

The simplest reliable pattern is retrieval-augmented generation:

  1. chunk your documents
  2. embed the chunks
  3. retrieve the top matches for a question
  4. ask the model to answer only from that context

For infrastructure docs, chunk by semantic section instead of fixed size alone. A runbook step, a service definition, or a config block should usually stay intact. Breaking those apart ruins the context.

Metadata helps more than people expect:

MetadataWhy it matters
servicefilters answers to the right system
environmentavoids mixing dev and prod
lastUpdatedhelps with freshness checks
sourceTypedocs, runbook, config, incident note

Prompt contract and answer format

The bot should return structured answers, not freeform essays. I like a format like:

  • short answer
  • supporting evidence
  • source citations
  • confidence or uncertainty
  • follow-up action if the answer is incomplete

That contract gives you room to say “I do not know” without sounding broken.

Caching, freshness, and fallback behavior

Infrastructure changes often enough that freshness matters. Cache retrieved context briefly, not indefinitely. If a source is older than your trust window, flag it in the answer.

Fallback behavior should be boring:

  • if retrieval fails, say so
  • if sources conflict, show both
  • if confidence is low, point to the owning team or runbook

Implementation sketch in JavaScript

Ingesting infrastructure knowledge

A basic ingest flow in JavaScript can scan Markdown docs, normalize them, and store embeddings plus metadata.

const files = await loadDocs("./infra-docs");

for (const file of files) {
  const chunks = splitByHeading(file.content);
  for (const chunk of chunks) {
    await vectorStore.upsert({
      id: `${file.path}:${chunk.id}`,
      text: chunk.text,
      metadata: {
        path: file.path,
        service: file.service,
        environment: file.environment,
        updatedAt: file.updatedAt
      }
    });
  }
}

The important part is not the SDK. It is the metadata. Without it, you cannot separate “the staging API” from “the production API” when the wording is similar.

Serving a question with retrieved context

At query time, fetch the top matches and pass only those into the prompt.

async function answerInfraQuestion(question) {
  const matches = await vectorStore.search(question, { topK: 5 });

  const context = matches.map((m, i) => ({
    id: i + 1,
    text: m.text,
    source: m.metadata.path,
    updatedAt: m.metadata.updatedAt
  }));

  const prompt = buildPrompt({ question, context });
  return llm.generate(prompt);
}

I would keep the prompt strict: answer only from the supplied context, cite each claim, and refuse to invent details.

Returning citations and uncertainty

Citations should be visible in the UI and machine-readable in the response. A practical shape is:

{
  answer: "The billing API runs in cluster-2.",
  citations: [
    { source: "docs/billing.md", excerpt: "Billing API is deployed to cluster-2" }
  ],
  confidence: "medium",
  note: "Runbook last updated 18 days ago"
}

If you cannot point to evidence, the bot should say that plainly. That is better than a polished hallucination.

Testing for accuracy, drift, and bad assumptions

Golden questions and regression checks

Write a small set of goldens before launch. Include questions that should be easy, ambiguous, and impossible. Then re-run them whenever prompts, embeddings, or source data change.

Good test cases include:

  • exact runbook lookup
  • service ownership questions
  • questions with outdated wording
  • questions that intentionally cross environments

Track whether the bot cites the right source and whether it answers the question you asked, not the adjacent one.

Conflicting sources and stale data

This is where infra bots usually fail. One doc says the service is in AWS. Another says it moved to GCP last quarter. The bot needs a policy:

  • prefer newest source
  • surface the conflict
  • ask for manual verification when the conflict affects production

If you do not encode this, the model will quietly pick one.

Operational guardrails

Access control and secrets hygiene

Use the same access model the docs already have. If a user cannot open a runbook in your wiki, the chatbot should not reveal it either. Also sanitize source material before indexing. Remove secrets, tokens, and any value that should never appear in a prompt or log.

Logging, rate limits, and human escalation

Log the question, retrieved document IDs, and answer metadata. Do not log raw secrets or unnecessary context text unless you have a clear retention policy.

Rate limits stop abuse, but escalation matters more. If the bot sees a production incident question, route it toward the human on call instead of pretending to diagnose. A good chatbot supports the ops team. It does not replace them.

What to improve after the first version

After launch, I would tune in this order:

  1. retrieval quality
  2. chunking strategy
  3. source freshness checks
  4. answer formatting
  5. conflict handling

People often start by tuning prompts. In practice, the biggest gains usually come from better source curation and stricter retrieval rules.

Conclusion

A useful infrastructure chatbot is less about “AI” and more about evidence. If you can keep it grounded in current docs, restrict its scope, and force it to show where each answer came from, it becomes a real internal tool instead of a demo.

The rule I keep coming back to is simple: if the bot cannot cite it, it should not claim it.

Share this post

More posts

Comments