
How to Audit an AI Note-Taker for HIPAA’s Data Processing Agreement Requirements
Why an AI therapy note-taker raises HIPAA questions
Therapists are adopting AI note-takers for a simple reason: the workflow is obvious. Listen to the session, generate a draft, save time. The security problem appears as soon as you trace where the data goes. A therapy session is not an ordinary meeting recording. It contains identifiable health information, and the note that comes out of it can become part of the clinical record.
The NPR report that put this topic in view framed the tension well: is this a helpful tool, or a breach of trust? From a security and compliance angle, it can be both if the vendor is vague about how it handles protected health information, or if the practice assumes the UI is the whole control plane.
A privacy policy is not a HIPAA contract. If a vendor touches PHI for a covered entity, you need a business associate agreement or an equivalent control document, not just a marketing page.
What the tool sees, stores, and generates
For an AI note-taker, the interesting part is not just the final note. I usually map the full chain:
| Stage | Data involved | Typical risk |
|---|---|---|
| Capture | Live audio, captions, screen context, metadata | Unintended collection of extra PHI |
| Transcription | Audio-to-text output, timestamps, diarization | Persistent transcript retention |
| Prompting | Therapist instructions, templates, user edits | Leakage through prompt logs |
| Generation | Draft assessment, plan, summary, coding hints | Incorrect clinical wording or over-collection |
| Storage | Drafts, notes, revision history, backups | Long-lived PHI exposure |
| Sharing | Export to EHR, email, collaboration tools | Secondary disclosure path |
Seen this way, the product is not “just note-taking.” It is a pipeline that ingests sensitive data, transforms it, and often keeps multiple versions of the same encounter around.
Why note-taking is not just a productivity feature
A normal productivity app helps a user write faster. A therapy note-taker can change the compliance posture of the whole practice.
If the system records audio, it may capture more than the therapist planned to document. If it generates a draft note, that draft may include details the therapist would not have written by hand. If it stores transcripts for model improvement, analytics, or debugging, it may create a durable copy of PHI outside the EHR.
That is why this is not only a UX review. It is a data-flow and contract review. The first question is not “Does the app feel secure?” The first question is “What exactly does the vendor do with identifiable health data, and under what legal authority?”
Start by proving whether the vendor is a business associate
Map the exact PHI flow from session to output
Before reading any contract, draw the flow on one page:
- Who records the session?
- Does audio go directly to the vendor, or does it first pass through the clinic’s device or EHR?
- Is transcription local, vendor-hosted, or forwarded to another service?
- Are drafts generated in the browser, in the vendor backend, or through a third-party API?
- What gets saved after the therapist clicks “finalize”?
- What gets retained if the note is discarded?
That map tells you whether the vendor is really just providing a user interface or is actually processing PHI on behalf of the practice. In HIPAA terms, that usually means business associate territory.
A practical audit trick: label every hop with one of three states.
- Clinic-controlled
- Vendor-controlled
- Third-party-controlled
If you cannot classify a hop, that is already a finding.
Check whether the vendor touches identifiable health data
A vendor often tries to avoid BA status by saying it “does not store medical records” or “only processes de-identified text.” That claim is too narrow if the service sees session content, patient names, diagnoses, medications, or therapist notes that can identify a person.
You do not need the vendor to store a full chart to create HIPAA exposure. If the service receives PHI to transcribe, summarize, score, classify, or redact it, the risk is already there. The key question is whether the service can access identifiable health data, even briefly, in transit or in memory.
Look for these signs:
- It accepts raw audio from a therapy session.
- It transcribes named people, medications, conditions, or locations.
- It prompts an LLM with the transcript.
- It stores prompts or output for debugging, QA, or abuse detection.
- It can export directly into the clinical record.
If the answer to any of those is yes, the “we don’t handle PHI” claim is probably not doing much work.
Spot common attempts to avoid BA status without changing the risk
Some vendors use language that sounds compliant while keeping the same technical behavior.
Common patterns:
- “We are only a processor” without a signed agreement that matches the data flow.
- “We never see PHI” when the service receives raw session audio.
- “Customer-configurable retention” while the backend still keeps operational logs.
- “De-identified by default” when the product is only usable with named patient context.
- “The customer is responsible for compliance” when the vendor still controls subprocessors and API routing.
The mistake is treating those phrases as if they describe the actual system. They do not. Your job is to connect the product claims to the implementation.
Read the data processing agreement like a control document
Required scope: use, disclosure, and permitted processing
For HIPAA, the contract is not decoration. It defines what the vendor can do with the data, what it cannot do, and what obligations attach to the service.
If a vendor offers a DPA, BAA, or both, read for the following:
| Contract term | What you want to see | Why it matters |
|---|---|---|
| Permitted use | Service delivery only, plus narrowly defined admin/security support | Prevents vendor from repurposing PHI |
| Disclosure limits | No sharing except approved subprocessors and legal requirements | Stops uncontrolled onward transfer |
| Training restrictions | No model training on PHI without explicit opt-in | Reduces reuse risk |
| Safeguards | Encryption, access control, logging, incident response | Gives you enforceable security duties |
| Subprocessor terms | Flow-down obligations and notification of changes | Controls third-party exposure |
If the agreement says the vendor may “improve the service” using customer data, stop there. That phrase often hides model training, fine-tuning, or human review. The contract needs to say exactly what is excluded by default.
Breach notice timelines and incident cooperation terms
A good contract does more than say “we will notify you.” It should say how fast, how much detail, and how the vendor will help after an incident.
At a minimum, I look for:
- prompt notice of suspected unauthorized access or disclosure
- a commitment to investigate and preserve evidence
- details on affected accounts, data types, and time windows
- cooperation with forensics, legal review, and customer notification
- a clear incident contact path
- no hidden “we may delay notice until our investigation is complete” escape hatch
For HIPAA, timing matters. The business associate should not sit on a breach report while the covered entity guesses what happened. If the contract only promises vague or delayed notice, the vendor is shifting the burden onto the clinic.
Termination, deletion, and return-of-data language
This is where many AI tools get slippery. You want to know what happens when the practice cancels the service, stops using a workspace, or deletes a note.
Check whether the contract covers:
- return of customer data in a usable format
- deletion from production systems after termination
- deletion from backups on a defined schedule
- destruction of derived artifacts, not just source records
- a written confirmation of deletion on request
If the agreement only says “we may retain certain information as required by law,” ask what information, for how long, and under which retention basis. “Required by law” is often broader in vendor contracts than people expect.
Audit retention controls instead of trusting the product UI
Session transcripts, drafts, prompts, and final notes
The UI may show a delete button. That does not tell you what actually exists underneath.
For a therapy note-taker, the retention surface usually includes:
- raw audio
- transcript text
- prompt history
- generated draft notes
- edited final notes
- autosaves and version history
- support tickets with attached logs
- analytics events
- backups and replicas
You need to ask about each category separately. A vendor might delete the final note but keep the transcript. Or it may purge the UI record while retaining a hidden support copy for 90 days.
That is not a theoretical concern. In practice, retention bugs come from product and infrastructure layers that were never designed together.
Admin settings versus backend defaults
A lot of products expose retention controls in admin settings, but the backend may still have default retention windows that override the UI.
Test for these mismatches:
- UI says “delete immediately,” backend keeps 30-day logs
- workspace admin sets a short retention period, but backups ignore it
- a note is deleted from the customer portal, but still appears in support tooling
- transcription is disabled in the UI, but speech logs remain for abuse detection
If you can get API documentation, compare the API behavior against the admin console. If the API and UI do not match, trust the API behavior. That is usually the real control path.
How to test whether deletion is real or only cosmetic
Use a harmless synthetic session. Do not use real patient material. I usually create a test record with:
- a fake patient name
- a unique nonsense phrase
- a timestamp I can search for later
- a known note template
Then I ask the vendor to demonstrate deletion in three places:
- the user UI
- the admin audit view or API
- any support/export mechanism they control
After deletion, check whether the unique phrase still appears in:
- search results
- exported workspace data
- activity logs
- support logs
- “recently deleted” or recovery bins
If the vendor cannot explain which stores are deleted synchronously and which are eventually purged, that is a real retention finding.
Trace subprocessors and third-party API exposure
Speech-to-text, LLM, logging, analytics, and support tools
The vendor is rarely the only processor in the chain. An AI note-taker may rely on:
- a speech-to-text provider
- an LLM API
- a cloud logging pipeline
- analytics instrumentation
- customer support platforms
- abuse detection or safety filters
- email or notification services
Each one expands the exposure surface. Each one can become a hidden route for PHI.
A good audit does not stop at “we are on AWS.” You need to know whether the vendor sends session data to an external transcription provider, whether prompts are logged by a separate observability stack, and whether support staff can see customer content in a ticketing tool.
What the vendor should disclose in a subprocessor list
A serious vendor should keep a subprocessor list that is actually useful. Not all lists are equal. I want to see:
- legal entity name
- service purpose
- data categories handled
- region or hosting location
- whether the provider can access content or only metadata
- change-notification process
If the list only says “infrastructure partners” or “service providers,” it is too vague to audit.
You can ask a simple question: “Which subprocessors can access session content, and which only handle metadata?” If the vendor cannot answer that cleanly, the architecture is not well understood.
Risk checks for cross-border transfer and model training reuse
Two issues show up often in these deployments:
- Cross-border processing
- Training reuse
Cross-border processing matters if transcripts or audio are routed through regions outside your expected jurisdiction. That may not be a dealbreaker, but it should be explicit. You need to know where content is stored, where it is processed, and where support staff can access it.
Training reuse is usually more sensitive. Ask directly:
- Is customer content used to train foundation models?
- Is content used for fine-tuning?
- Is it used for human review?
- Is there an opt-out, and is it default-on or default-off?
- Does opt-out cover logs, prompts, outputs, and backups?
If the vendor says “we may use data to improve our models,” that is not acceptable for therapy content unless the contract and implementation are both explicit and opt-in.
Verify safeguards for breach risk and confidentiality
Authentication, access control, and session isolation
The first security test is boring for a reason. If the vendor cannot protect access, everything else is noise.
Check for:
- SSO or SAML support
- MFA for privileged users
- role-based access control
- per-tenant isolation
- session timeout controls
- audit logs for data access and admin actions
For a therapy deployment, access control should be stricter than a generic SaaS dashboard. A receptionist should not be able to browse transcripts. A support engineer should not casually read session content. A developer should not be able to search production PHI from a console unless there is a documented, tightly controlled break-glass process.
Encryption, key management, and log hygiene
Encryption is easy to claim and harder to audit. Ask for the boring specifics:
- encryption in transit
- encryption at rest
- key management ownership
- rotation policy
- whether customer-managed keys are supported
- whether logs may contain PHI
- whether support tooling redacts content by default
Log hygiene is where a lot of AI tools leak data. The product may redact the visible UI but still write full prompts or transcript fragments into tracing systems. If that happens, your “deletion” story is already broken.
A useful rule: if logs can reconstruct a session, treat them as sensitive records.
Redaction, minimization, and least-data collection
The best compliance control is not a legal clause. It is collecting less data in the first place.
Ask whether the product can:
- limit recording windows
- redact names, phone numbers, addresses, and insurance IDs
- avoid storing raw audio when only the final note is needed
- suppress unnecessary metadata
- separate therapist workflow data from patient content
Minimization matters because it reduces the blast radius of a breach and makes retention easier to reason about. If a vendor needs full audio plus full transcript plus full prompts to generate a note, that should be a deliberate tradeoff, not an accident.
Build a practical audit workflow for a therapy deployment
Questions to ask the vendor before procurement
I like to send a short, direct questionnaire before anyone signs paper. The goal is not to win an argument. The goal is to force a concrete architecture answer.
Ask:
- Do you process PHI on behalf of a covered entity?
- Will you sign a BAA or equivalent DPA for PHI handling?
- Do you store raw audio, transcripts, prompts, drafts, and final notes?
- What is the default retention period for each data type?
- Can we delete each data type separately?
- Do you use customer content for model training or human review?
- Which subprocessors can access content?
- Where is content stored and processed?
- What security certifications or independent reports do you have?
- How do you notify customers of incidents and subprocessor changes?
If the vendor answers in vague generalities, keep pushing until the answer becomes operational.
Evidence to request from the security and legal teams
You do not need a 200-question procurement checklist to get signal. You need the right artifacts.
| Evidence | What it should tell you |
|---|---|
| Signed BAA or DPA | Legal authority and permitted processing |
| Subprocessor list | Third-party exposure and change notice |
| Data retention policy | What gets kept, where, and for how long |
| Incident response summary | How fast the vendor responds to breaches |
| Security architecture overview | Isolation, encryption, and logging controls |
| Deletion or export documentation | Whether data lifecycle controls are real |
| Independent assessment | Whether claims were reviewed by a third party |
If they offer a SOC 2 report, read it for scope. A report can be useful and still miss the exact service you care about. The question is whether the note-taking workflow, not just the parent company, is inside the audited boundary.
Safe test cases that do not expose real patient records
Use synthetic data and controlled sessions. Good test cases include:
- a fake appointment with no real names
- a transcript containing one unique test phrase
- a note generated from a canned script
- a deliberate deletion test
- a role-based access test with low-privilege and admin accounts
A safe verification sequence looks like this:
- Create a test workspace.
- Run a synthetic therapy-style session.
- Confirm the transcript, draft, and final note appear where expected.
- Delete the session.
- Search the UI, API, exports, and support view for the unique test phrase.
- Ask the vendor to explain any remaining traces.
That gives you a concrete result without exposing actual patient content.
Red flags that should stop a rollout
Vague retention claims
If the vendor says “we retain data only as needed” and cannot name the data classes, retention windows, or deletion path, stop. That is not a control.
No DPA or no willingness to sign one
If the vendor refuses a BAA, or says it is “not necessary” despite touching identifiable health data, the discussion is over. You do not have a compliant deployment without contractual coverage.
Hidden training use, weak deletion, or undisclosed subprocessors
Any one of these should trigger escalation:
- customer content may train the model by default
- deleted notes still appear in backups or support tools with no purge policy
- the vendor will not name its transcription or LLM providers
- the subprocessor list changes without notice
- there is no way to isolate one customer’s content from another’s support access
These are not small paperwork issues. They are signs that the vendor has not separated product convenience from sensitive-data handling.
What a defensible deployment looks like
Contract, technical, and operational controls working together
A defensible rollout is not one magic document. It is a stack:
- a signed BAA or appropriate DPA
- a defined PHI flow with minimal collection
- explicit retention and deletion controls
- disclosed subprocessors
- verified access controls and audit logs
- a tested incident response path
- a no-training-by-default rule for customer content
If any layer is missing, the whole posture weakens. The product may still be useful, but you cannot call it boringly safe.
Monitoring after launch and periodic re-audit
The audit does not end at procurement. Vendors change infrastructure, add subprocessors, update model providers, and adjust logging behavior.
Re-audit on a schedule:
- after major product releases
- after a subprocessor change
- after retention or deletion changes
- after a security incident
- after a new integration is added
I also recommend a standing review of:
- access logs
- deletion verification samples
- subprocessor notices
- contract amendments
- support access exceptions
That turns the deployment from a one-time approval into a living control.
Conclusion: useful note-taking is not the same as compliant note-taking
AI can absolutely help a therapist draft notes faster, but speed is not the compliance bar. The real question is whether the vendor can prove what data it touches, why it can touch it, how long it keeps it, who can see it, and what happens when something goes wrong.
If you audit the business associate status, the contract, retention, subprocessors, and security controls together, you get a real answer. If you only look at the product UI, you are hoping the backend agrees with the marketing.


