Product · Engineering

How Knock2 Builds Deep Context for AI Sales Agents

Most sales AI starts blind. We built a research pipeline that gives our scoring and personalization engines the same context your best SDR carries on day one — automatically.

The Context Problem With AI Sales Tools

Every AI-powered sales tool has the same dirty secret: it doesn't know anything about your business.

You connect your CRM. Install the tracking pixel. And then you hit a wall. Lead scoring is generic. Outreach is bland. The tool treats a Fortune 500 enterprise buyer the same as a 10-person startup — because it has no context on who you actually sell to, what makes you different, or why your customers chose you over the competition.

The typical fix? A long onboarding questionnaire. A sales enablement team filling out profile fields. A "ramp period" where the tool slowly learns from your data over weeks. We didn't want that for Knock2.

We wanted Knock2 to understand a customer's business — deeply — from the moment they sign up. To give our agents the kind of context that usually takes a rep months to accumulate. Here's how we built it.

The Research Pipeline

When a customer onboards to Knock2, we automatically kick off a parallel web research pipeline using neural search APIs. The system fires a series of concurrent queries — split across two surfaces — and synthesizes everything into a structured knowledge base before the customer ever clicks through to their dashboard.

knock2 / context-pipeline — live research pass

Product pages

About & team

Case studies

Review sites

Funding signals

Competitive "vs"

KNOCK2
CONTEXT
ENGINE synthesizing…

01 / OVERVIEW

Company Overview

Stage, size, geo, ICP

02 / CUSTOMERS

Use Cases & Logos

Outcomes, verticals

03 / PERSONAS

Buyer Personas

Champions, objections

04 / COMPETE

Competitive Position

Differentiators, wins

05 / VALUE

Win Conditions

Triggers, aha moments

Research pipeline active — scanning product pages

research-pipeline.ts — 3 stages

Parallel Web Research

Concurrent searches across two surfaces

We fire concurrent queries — scoped to the customer's own domain and across the open web — in a single async pass. Own-domain searches cover product pages, case studies, blog content, pricing, and implementation guides. Web-wide searches pull competitive comparisons, third-party reviews, funding signals, buyer persona indicators, and urgency triggers. The full pass completes in seconds.

neural search domain-scoped Promise.all() async I/O

AI Synthesis

LLM structures raw snippets into knowledge

All search output feeds a large language model with a structured prompt. The model answers specific questions across five categories. One strict rule: answer only from the evidence provided — no hallucination. If research doesn't support a confident answer, we return null. We'd rather have a gap than a fabrication.

structured prompt JSON output null-safe evidence-grounded

Knowledge Base Storage

Versioned, section-organized, human-in-the-loop

Synthesized answers are stored with metadata — AI-generated vs. human-written, last-updated timestamp. Human overrides are respected across refresh cycles. Only AI-generated answers get re-run. Customers get full context on day one, with a clean path to refine the nuances that don't live in public content.

versioned override-safe human-in-the-loop

How This Powers Knock2's Core Engines

The knowledge base isn't a settings panel artifact. It's a live input to the two most important functions in the product — and what makes both work without a manual setup phase.

Lead Scoring Engine

ICP-aware from day one

When Knock2 de-anonymizes a visitor, the scoring engine compares them against the knowledge base — not a generic model. It knows which industries you've won, which company sizes fit, which personas are in the buying committee. A mid-market fintech visitor hits differently when the system knows you've closed three named logos in that exact segment.

Industry & vertical fit weighting

Company size & stage matching

Buyer persona signal detection

Named logo segment proximity

Personalization Engine

Outreach that mirrors your best SDR

When it's time to engage — AI-drafted email, Slack notification to a rep, or automated sequence — the personalization engine draws directly from the knowledge base. It knows your differentiators, your win conditions, the objections your buyers raise. The output mirrors what your best SDR already knows intuitively, at scale, on the first day.

Differentiator-aware messaging

Objection-preemptive framing

Win condition signal matching

Competitive context awareness

Why Neural Search — Not Scraping or Standard APIs

We evaluated three approaches before landing on the architecture we use today. The distinction matters for anyone building context pipelines for sales agents.

Approach	The Problem	Why We Moved On
Web Scraping	Raw HTML, breaks constantly, no semantic understanding	You get the text of a page — not its meaning
Standard Search APIs	Optimized for human readers, not LLM synthesis	Heavy post-processing, poor signal-to-noise ratio
Neural Search (Exa)	Returns semantically relevant, pre-filtered snippets	Ready for LLM synthesis without cleanup

The key capability is domain-scoped neural search. We can ask "find competitive positioning content on this specific company's website" and retrieve their own "why us" blog posts and comparison pages — the exact content a good SDR would find manually. That's not replicable with keyword-based search.

We use Exa for this layer. Their API is purpose-built for AI applications — structured output, highlight extraction, domain scoping — which removes the post-processing overhead that makes other approaches slow and brittle.

What We Learned Building This

Targeted queries beat broad ones

Specific search intents produce dramatically better synthesized output. The quality delta between a broad "tell me about this company" query and intent-scoped searches is significant. Focused input → focused output.

Parallel execution is non-negotiable

Sequential API calls produce unacceptable latency for an onboarding flow. Async parallelism keeps the entire research pass fast enough that customers don't notice it's happening.

Structured output beats freeform

Narrative summaries are fluent but hard to consume programmatically. Specific questions with JSON-structured answers are directly usable by downstream scoring and personalization systems.

Human overrides are essential

AI research gets you 80–90% of the way. Every business has nuances that don't live in public content. The override layer fills gaps without requiring customers to start from scratch.

What's Next

Right now, the research pipeline runs on the customer's own business — building the foundational context that makes scoring and personalization work from day one.

We're actively building the next layer: running the same pipeline outward on your prospects' businesses. Every visitor who hits your site, automatically researched — their tech stack, recent funding, competitive landscape, hiring signals — synthesized and available to your sales team at the point of action.

The infrastructure is already there. The research pipeline, the synthesis layer, the structured knowledge base — it all generalizes. We just need to point it outward.

Stay tuned.

Compare Notes

Building AI-assisted sales workflows?

If you're working on context pipelines, agent architecture, or making AI sales tools work without long ramp periods — we're happy to share what's working.

Reach out →

How Knock2 Builds Deep Context for AI Sales Agents

The Context Problem With AI Sales Tools

The Research Pipeline

How This Powers Knock2's Core Engines

Why Neural Search — Not Scraping or Standard APIs

What We Learned Building This

What's Next

John DiLoreto

Latest articles

The case for fewer emails to a tighter list

Six plays for turning anonymous traffic into pipeline