Which Ones Actually Have BAAs (And Which I Wouldn’t Touch with PHI)
Last week, I broke down what HIPAA compliance really means when you’re building AI systems.
Not the checkbox version.
The real version.
The real version.
Who sees the data.
Where it goes.
How it’s protected.
Where it goes.
How it’s protected.
If you haven’t read that yet, go back and do it. Because this post won’t make sense without that foundation.
This one is different.
This is where we stop talking theory… and start naming names.
Because this is the question I keep getting:
“Okay, Avi… but what tools can I actually use?”
Fair question.
And honestly? Most of the internet gets this wrong.
So here’s the no-BS version based on what we’re actually putting into production.
Let’s Start Where Everyone Starts: The LLMs
This is where most teams focus first.
Also where they make their first mistake.
What Actually Works
OpenAI API (Enterprise)
Yes, they will sign a BAA.
No, not on the default API plan.
No, not on the default API plan.
You need to go through enterprise. You email them, explain the use case, and they evaluate it.
The big advantage?
Zero-retention endpoints.
Zero-retention endpoints.
That alone changes the game. Your prompts and outputs don’t stick around.
ChatGPT consumer = not HIPAA
API enterprise with BAA = different story
API enterprise with BAA = different story
Anthropic Claude (Enterprise / via Cloud Providers)
Same idea.
Claude Enterprise supports BAAs, but they’re selective about features.
What I like here is the safety layer. For clinical or legal-medical use cases, that matters more than people think.
Also, you can access Claude through:
- AWS Bedrock
- Google Cloud
Which means you can inherit compliance through your cloud provider.
That’s powerful.
Google Vertex AI (Gemini)
If you’re already in Google Cloud, this is easy.
Google signs BAAs as part of their cloud agreements.
No extra gymnastics.
Azure OpenAI
Same story.
If your client is already deep in Microsoft, this is probably the smoothest path.
Honestly, sometimes the “best” model is the one that gets approved fastest by compliance.
AWS Bedrock
This one’s my favorite for flexibility.
Single BAA.
Multiple models.
Multiple models.
You don’t have to negotiate with five vendors. AWS handles the umbrella.
If your client hates vendor lock-in, this is usually where I go.
xAI / Grok
Emerging.
They will sign BAAs in some cases, but it’s not as standardized yet.
I wouldn’t call it default-safe… but I am watching it closely.
What I Avoid
Anything that:
- Says “secure” but avoids the word BAA
- Is a wrapper over another API without clarity
- Is a startup that can’t explain data flow in one diagram
I’ve rejected multiple tools recently just on this basis.
Because once PHI is involved, guessing is not a strategy.
The Layer Most People Underestimate: Vector Databases
This is where things quietly break compliance.
Because people assume:
“Hey, it’s just embeddings.”
No.
Embeddings can often be reverse-engineered or linked back.
So treat them as PHI.
What I Actually Trust
Pinecone
Mature.
BAA available.
Works across clouds.
BAA available.
Works across clouds.
This is the “I don’t want surprises” option.
Weaviate Enterprise Cloud
Strong architecture.
Good healthcare awareness.
Good healthcare awareness.
I’ve had good experiences with teams that actually understand compliance here.
Qdrant Cloud
Solid performance.
HIPAA + SOC 2 setups available.
HIPAA + SOC 2 setups available.
Good option for high-scale retrieval.
What I Avoid
Chroma (hosted)
They literally say not designed for HIPAA.
That should end the conversation.
Random Open Source (Hosted Elsewhere)
If you don’t control:
- Hosting
- Encryption
- Access
You don’t control compliance.
When Self-Hosting Makes Sense
For some clients, this is the only acceptable answer.
- pgvector on Postgres
- Milvus
- Custom setups
Hosted on AWS / Azure / GCP under BAA.
More work.
But full control.
The Backbone: Cloud Infrastructure
This is the part nobody gets excited about.
Also the part that actually saves you in audits.
The Safe Defaults
- AWS
- Google Cloud
- Azure
All offer BAAs.
All cover:
- Storage
- Compute
- Databases
- Logging
The Mistake I Keep Seeing
One service is compliant.
Another is not.
Data flows between them anyway.
And nobody notices until audit time.
Example:
- Secure storage ✔️
- Secure vector DB ✔️
- Non-compliant embedding API ❌
That breaks the chain.
So What Stack Am I Actually Recommending?
Let me make this concrete.
Safe + Fast Setup
- Cloud: AWS or Azure
- LLM: OpenAI API (enterprise) or Claude via Bedrock
- Vector DB: Pinecone or Weaviate
- Storage: S3 / Blob with encryption
- Logging: Native cloud tools
This passes audits.
This scales.
This is what we deploy.
High-Control Setup
- Self-hosted models
- Self-hosted vector DB
- Full cloud control
More effort.
But zero dependency risk.
A Few Hard Truths (From Real Projects)
Let me save you some pain.
- SOC 2 ≠ HIPAA
- API key ≠ encryption
- “Enterprise-ready” ≠ BAA available
- Embeddings ≠ anonymous
And the big one:
The cheapest stack almost always becomes the most expensive later.
Why This Actually Matters for Your Business
This isn’t just compliance.
It’s positioning.
When you can walk into a room and say:
- Here’s our stack
- Here are the signed BAAs
- Here’s how data never leaves the compliant boundary
You’re not just another vendor.
You’re the safe choice.
And in healthcare?
Safe wins.
What’s Coming Next
In Part 3, I’m going even deeper.
Real architecture.
Real flows.
Real configs.
Real flows.
Real configs.
The exact system we used that passed audit.
If you’re building something right now and not sure whether it will pass compliance…
Fix it now.
Not when legal calls you.
I’m Avi Kumar at Kuware.
We don’t sell “HIPAA-ready” slides.
We build systems that actually survive audits.