How to Audit an AI Note-Taker for HIPAA’s Data Processing Agreement Requirements

AI Usage (93%)

Why an AI therapy note-taker raises HIPAA questions

Therapists are adopting AI note-takers for a simple reason: the workflow is obvious. Listen to the session, generate a draft, save time. The security problem appears as soon as you trace where the data goes. A therapy session is not an ordinary meeting recording. It contains identifiable health information, and the note that comes out of it can become part of the clinical record.

The NPR report that put this topic in view framed the tension well: is this a helpful tool, or a breach of trust? From a security and compliance angle, it can be both if the vendor is vague about how it handles protected health information, or if the practice assumes the UI is the whole control plane.

⚠️

A privacy policy is not a HIPAA contract. If a vendor touches PHI for a covered entity, you need a business associate agreement or an equivalent control document, not just a marketing page.

What the tool sees, stores, and generates

For an AI note-taker, the interesting part is not just the final note. I usually map the full chain:

Stage	Data involved	Typical risk
Capture	Live audio, captions, screen context, metadata	Unintended collection of extra PHI
Transcription	Audio-to-text output, timestamps, diarization	Persistent transcript retention
Prompting	Therapist instructions, templates, user edits	Leakage through prompt logs
Generation	Draft assessment, plan, summary, coding hints	Incorrect clinical wording or over-collection
Storage	Drafts, notes, revision history, backups	Long-lived PHI exposure
Sharing	Export to EHR, email, collaboration tools	Secondary disclosure path

Seen this way, the product is not “just note-taking.” It is a pipeline that ingests sensitive data, transforms it, and often keeps multiple versions of the same encounter around.

Why note-taking is not just a productivity feature

A normal productivity app helps a user write faster. A therapy note-taker can change the compliance posture of the whole practice.

If the system records audio, it may capture more than the therapist planned to document. If it generates a draft note, that draft may include details the therapist would not have written by hand. If it stores transcripts for model improvement, analytics, or debugging, it may create a durable copy of PHI outside the EHR.

That is why this is not only a UX review. It is a data-flow and contract review. The first question is not “Does the app feel secure?” The first question is “What exactly does the vendor do with identifiable health data, and under what legal authority?”

Start by proving whether the vendor is a business associate

Map the exact PHI flow from session to output

Before reading any contract, draw the flow on one page:

Who records the session?
Does audio go directly to the vendor, or does it first pass through the clinic’s device or EHR?
Is transcription local, vendor-hosted, or forwarded to another service?
Are drafts generated in the browser, in the vendor backend, or through a third-party API?
What gets saved after the therapist clicks “finalize”?
What gets retained if the note is discarded?

That map tells you whether the vendor is really just providing a user interface or is actually processing PHI on behalf of the practice. In HIPAA terms, that usually means business associate territory.

A practical audit trick: label every hop with one of three states.

Clinic-controlled
Vendor-controlled
Third-party-controlled

If you cannot classify a hop, that is already a finding.

Check whether the vendor touches identifiable health data

A vendor often tries to avoid BA status by saying it “does not store medical records” or “only processes de-identified text.” That claim is too narrow if the service sees session content, patient names, diagnoses, medications, or therapist notes that can identify a person.

You do not need the vendor to store a full chart to create HIPAA exposure. If the service receives PHI to transcribe, summarize, score, classify, or redact it, the risk is already there. The key question is whether the service can access identifiable health data, even briefly, in transit or in memory.

Look for these signs:

It accepts raw audio from a therapy session.
It transcribes named people, medications, conditions, or locations.
It prompts an LLM with the transcript.
It stores prompts or output for debugging, QA, or abuse detection.
It can export directly into the clinical record.

If the answer to any of those is yes, the “we don’t handle PHI” claim is probably not doing much work.

Spot common attempts to avoid BA status without changing the risk

Some vendors use language that sounds compliant while keeping the same technical behavior.

Common patterns:

“We are only a processor” without a signed agreement that matches the data flow.
“We never see PHI” when the service receives raw session audio.
“Customer-configurable retention” while the backend still keeps operational logs.
“De-identified by default” when the product is only usable with named patient context.
“The customer is responsible for compliance” when the vendor still controls subprocessors and API routing.

The mistake is treating those phrases as if they describe the actual system. They do not. Your job is to connect the product claims to the implementation.

Read the data processing agreement like a control document

Required scope: use, disclosure, and permitted processing

For HIPAA, the contract is not decoration. It defines what the vendor can do with the data, what it cannot do, and what obligations attach to the service.

If a vendor offers a DPA, BAA, or both, read for the following:

Contract term	What you want to see	Why it matters
Permitted use	Service delivery only, plus narrowly defined admin/security support	Prevents vendor from repurposing PHI
Disclosure limits	No sharing except approved subprocessors and legal requirements	Stops uncontrolled onward transfer
Training restrictions	No model training on PHI without explicit opt-in	Reduces reuse risk
Safeguards	Encryption, access control, logging, incident response	Gives you enforceable security duties
Subprocessor terms	Flow-down obligations and notification of changes	Controls third-party exposure

If the agreement says the vendor may “improve the service” using customer data, stop there. That phrase often hides model training, fine-tuning, or human review. The contract needs to say exactly what is excluded by default.

Breach notice timelines and incident cooperation terms

A good contract does more than say “we will notify you.” It should say how fast, how much detail, and how the vendor will help after an incident.

At a minimum, I look for:

prompt notice of suspected unauthorized access or disclosure
a commitment to investigate and preserve evidence
details on affected accounts, data types, and time windows
cooperation with forensics, legal review, and customer notification
a clear incident contact path
no hidden “we may delay notice until our investigation is complete” escape hatch

For HIPAA, timing matters. The business associate should not sit on a breach report while the covered entity guesses what happened. If the contract only promises vague or delayed notice, the vendor is shifting the burden onto the clinic.

Termination, deletion, and return-of-data language

This is where many AI tools get slippery. You want to know what happens when the practice cancels the service, stops using a workspace, or deletes a note.

Check whether the contract covers:

return of customer data in a usable format
deletion from production systems after termination
deletion from backups on a defined schedule
destruction of derived artifacts, not just source records
a written confirmation of deletion on request

If the agreement only says “we may retain certain information as required by law,” ask what information, for how long, and under which retention basis. “Required by law” is often broader in vendor contracts than people expect.

Audit retention controls instead of trusting the product UI

Session transcripts, drafts, prompts, and final notes

The UI may show a delete button. That does not tell you what actually exists underneath.

For a therapy note-taker, the retention surface usually includes:

raw audio
transcript text
prompt history
generated draft notes
edited final notes
autosaves and version history
support tickets with attached logs
analytics events
backups and replicas

You need to ask about each category separately. A vendor might delete the final note but keep the transcript. Or it may purge the UI record while retaining a hidden support copy for 90 days.

That is not a theoretical concern. In practice, retention bugs come from product and infrastructure layers that were never designed together.

Admin settings versus backend defaults

A lot of products expose retention controls in admin settings, but the backend may still have default retention windows that override the UI.

Test for these mismatches:

UI says “delete immediately,” backend keeps 30-day logs
workspace admin sets a short retention period, but backups ignore it
a note is deleted from the customer portal, but still appears in support tooling
transcription is disabled in the UI, but speech logs remain for abuse detection

If you can get API documentation, compare the API behavior against the admin console. If the API and UI do not match, trust the API behavior. That is usually the real control path.

How to test whether deletion is real or only cosmetic

Use a harmless synthetic session. Do not use real patient material. I usually create a test record with:

a fake patient name
a unique nonsense phrase
a timestamp I can search for later
a known note template

Then I ask the vendor to demonstrate deletion in three places:

the user UI
the admin audit view or API
any support/export mechanism they control

After deletion, check whether the unique phrase still appears in:

search results
exported workspace data
activity logs
support logs
“recently deleted” or recovery bins

If the vendor cannot explain which stores are deleted synchronously and which are eventually purged, that is a real retention finding.

Trace subprocessors and third-party API exposure

Speech-to-text, LLM, logging, analytics, and support tools

The vendor is rarely the only processor in the chain. An AI note-taker may rely on:

a speech-to-text provider
an LLM API
a cloud logging pipeline
analytics instrumentation
customer support platforms
abuse detection or safety filters
email or notification services

Each one expands the exposure surface. Each one can become a hidden route for PHI.

A good audit does not stop at “we are on AWS.” You need to know whether the vendor sends session data to an external transcription provider, whether prompts are logged by a separate observability stack, and whether support staff can see customer content in a ticketing tool.

What the vendor should disclose in a subprocessor list

A serious vendor should keep a subprocessor list that is actually useful. Not all lists are equal. I want to see:

legal entity name
service purpose
data categories handled
region or hosting location
whether the provider can access content or only metadata
change-notification process

If the list only says “infrastructure partners” or “service providers,” it is too vague to audit.

You can ask a simple question: “Which subprocessors can access session content, and which only handle metadata?” If the vendor cannot answer that cleanly, the architecture is not well understood.

Risk checks for cross-border transfer and model training reuse

Two issues show up often in these deployments:

Cross-border processing
Training reuse

Cross-border processing matters if transcripts or audio are routed through regions outside your expected jurisdiction. That may not be a dealbreaker, but it should be explicit. You need to know where content is stored, where it is processed, and where support staff can access it.

Training reuse is usually more sensitive. Ask directly:

Is customer content used to train foundation models?
Is content used for fine-tuning?
Is it used for human review?
Is there an opt-out, and is it default-on or default-off?
Does opt-out cover logs, prompts, outputs, and backups?

If the vendor says “we may use data to improve our models,” that is not acceptable for therapy content unless the contract and implementation are both explicit and opt-in.

Verify safeguards for breach risk and confidentiality

Authentication, access control, and session isolation

The first security test is boring for a reason. If the vendor cannot protect access, everything else is noise.

Check for:

SSO or SAML support
MFA for privileged users
role-based access control
per-tenant isolation
session timeout controls
audit logs for data access and admin actions

For a therapy deployment, access control should be stricter than a generic SaaS dashboard. A receptionist should not be able to browse transcripts. A support engineer should not casually read session content. A developer should not be able to search production PHI from a console unless there is a documented, tightly controlled break-glass process.

Encryption, key management, and log hygiene

Encryption is easy to claim and harder to audit. Ask for the boring specifics:

encryption in transit
encryption at rest
key management ownership
rotation policy
whether customer-managed keys are supported
whether logs may contain PHI
whether support tooling redacts content by default

Log hygiene is where a lot of AI tools leak data. The product may redact the visible UI but still write full prompts or transcript fragments into tracing systems. If that happens, your “deletion” story is already broken.

A useful rule: if logs can reconstruct a session, treat them as sensitive records.

Redaction, minimization, and least-data collection

The best compliance control is not a legal clause. It is collecting less data in the first place.

Ask whether the product can:

limit recording windows
redact names, phone numbers, addresses, and insurance IDs
avoid storing raw audio when only the final note is needed
suppress unnecessary metadata
separate therapist workflow data from patient content

Minimization matters because it reduces the blast radius of a breach and makes retention easier to reason about. If a vendor needs full audio plus full transcript plus full prompts to generate a note, that should be a deliberate tradeoff, not an accident.

Build a practical audit workflow for a therapy deployment

Questions to ask the vendor before procurement

I like to send a short, direct questionnaire before anyone signs paper. The goal is not to win an argument. The goal is to force a concrete architecture answer.

Ask:

Do you process PHI on behalf of a covered entity?
Will you sign a BAA or equivalent DPA for PHI handling?
Do you store raw audio, transcripts, prompts, drafts, and final notes?
What is the default retention period for each data type?
Can we delete each data type separately?
Do you use customer content for model training or human review?
Which subprocessors can access content?
Where is content stored and processed?
What security certifications or independent reports do you have?
How do you notify customers of incidents and subprocessor changes?

If the vendor answers in vague generalities, keep pushing until the answer becomes operational.

Evidence to request from the security and legal teams

You do not need a 200-question procurement checklist to get signal. You need the right artifacts.

Evidence	What it should tell you
Signed BAA or DPA	Legal authority and permitted processing
Subprocessor list	Third-party exposure and change notice
Data retention policy	What gets kept, where, and for how long
Incident response summary	How fast the vendor responds to breaches
Security architecture overview	Isolation, encryption, and logging controls
Deletion or export documentation	Whether data lifecycle controls are real
Independent assessment	Whether claims were reviewed by a third party

If they offer a SOC 2 report, read it for scope. A report can be useful and still miss the exact service you care about. The question is whether the note-taking workflow, not just the parent company, is inside the audited boundary.

Safe test cases that do not expose real patient records

Use synthetic data and controlled sessions. Good test cases include:

a fake appointment with no real names
a transcript containing one unique test phrase
a note generated from a canned script
a deliberate deletion test
a role-based access test with low-privilege and admin accounts

A safe verification sequence looks like this:

Create a test workspace.
Run a synthetic therapy-style session.
Confirm the transcript, draft, and final note appear where expected.
Delete the session.
Search the UI, API, exports, and support view for the unique test phrase.
Ask the vendor to explain any remaining traces.

That gives you a concrete result without exposing actual patient content.

Red flags that should stop a rollout

Vague retention claims

If the vendor says “we retain data only as needed” and cannot name the data classes, retention windows, or deletion path, stop. That is not a control.

No DPA or no willingness to sign one

If the vendor refuses a BAA, or says it is “not necessary” despite touching identifiable health data, the discussion is over. You do not have a compliant deployment without contractual coverage.

Hidden training use, weak deletion, or undisclosed subprocessors

Any one of these should trigger escalation:

customer content may train the model by default
deleted notes still appear in backups or support tools with no purge policy
the vendor will not name its transcription or LLM providers
the subprocessor list changes without notice
there is no way to isolate one customer’s content from another’s support access

These are not small paperwork issues. They are signs that the vendor has not separated product convenience from sensitive-data handling.

What a defensible deployment looks like

Contract, technical, and operational controls working together

A defensible rollout is not one magic document. It is a stack:

a signed BAA or appropriate DPA
a defined PHI flow with minimal collection
explicit retention and deletion controls
disclosed subprocessors
verified access controls and audit logs
a tested incident response path
a no-training-by-default rule for customer content

If any layer is missing, the whole posture weakens. The product may still be useful, but you cannot call it boringly safe.

Monitoring after launch and periodic re-audit

The audit does not end at procurement. Vendors change infrastructure, add subprocessors, update model providers, and adjust logging behavior.

Re-audit on a schedule:

after major product releases
after a subprocessor change
after retention or deletion changes
after a security incident
after a new integration is added

I also recommend a standing review of:

access logs
deletion verification samples
subprocessor notices
contract amendments
support access exceptions

That turns the deployment from a one-time approval into a living control.

Conclusion: useful note-taking is not the same as compliant note-taking

AI can absolutely help a therapist draft notes faster, but speed is not the compliance bar. The real question is whether the vendor can prove what data it touches, why it can touch it, how long it keeps it, who can see it, and what happens when something goes wrong.

If you audit the business associate status, the contract, retention, subprocessors, and security controls together, you get a real answer. If you only look at the product UI, you are hoping the backend agrees with the marketing.