Welcome to the Third Place

A digital space for thinking out loud about AI, data, and the infrastructure underneath.

This is where I cultivate ideas that aren't quite crystallized, too developed for a notebook, but somewhere in between.


01

Highlights

SELECTED WORK

Curated pieces that have been cultivated, pruned, and polished.


02

The Garden

LIVING KNOWLEDGE BASE

750+ interconnected concepts growing across 7 domains. This isn't a blog—it's a network of ideas that evolve, connect, and reveal patterns over time.

750+ Concepts
7 Domains
Human-Centered Design
Information Architecture
Research Methods
Data Engineering
AI Mechanisms
Knowledge Engineering
Cross-Domain
Enter the Garden garden-heymhk.com

03

AI: From the Engine Room

A DATA LEADER'S PERSPECTIVE ON MODERN AI

13 articles from someone who has worked inside AI mechanisms, not theorizing from outside. Practitioner credibility over pundit speculation.

"These systems are more comprehensible than marketing suggests, and more limited than hype implies. Both things are true simultaneously."

Pillar 1

AI Mechanisms

Engine Room Context: If you read my recent LinkedIn post, you know the backstory: I worked on attention-based neural networks in 2018-2020, before ChatGPT made "transformers" a household word. That experience—recognizing the machinery inside today's AI headlines—is what this series builds from.

Why This Series

There's a lot of excellent AI coverage available—from researchers explaining breakthroughs to executives sharing implementation stories. What I've found harder to find is the middle layer: practical explanations of how these systems work that connect to real decisions about data, governance, and interfaces.

That's the gap this series tries to fill. Not because other perspectives are wrong, but because this one might be useful to people navigating similar questions.

The engine room isn't a better vantage point than the bridge—just a different one, with different things visible.

What You'll Find Here

The series covers three areas over thirteen articles:

AI Mechanisms (Articles 1-5): How attention, training, and context actually work. The goal is intuition, not exhaustive technical detail.

The Proprietary Data Paradox (Articles 6-9): Why data strategy is harder than it looks. Knowledge architecture, tacit expertise, interface design.

Forward-Looking Governance (Articles 10-13): Hallucination, effective prompting, why AI readiness is a governance question, and what it all adds up to.

The Takeaway: Understanding how something works changes the questions you ask. That's what I'm hoping this series provides—better questions, not final answers.
The Core Idea

Before attention mechanisms, neural networks for language had a sequencing problem. They processed text word by word, compressing everything into a fixed-size representation. By the end of a long sentence, early information had degraded.

Attention addressed this by letting models look back at all previous inputs and compute relevance weights dynamically. For each output, the model calculates how much to weight each input—which parts to "attend to" for this particular task.

Attention lets models compute relevance weights dynamically—deciding which inputs matter for each specific output.

The Transformer Architecture

The 2017 "Attention Is All You Need" paper showed you could build powerful language models using attention as the core mechanism. That architecture—the Transformer—underlies GPT, Claude, Gemini, and most of the models in current use.

The Capabilities and Limits

Attention enables impressive capabilities. Models can track pronouns back to their antecedents, maintain coherence across long passages, and pick up on subtle contextual cues.

The limits are equally important. Attention operates on learned patterns from training data. It excels at tasks that resemble what it's seen. It struggles with genuinely novel reasoning that requires going beyond those patterns.

Practical Implications

Impressive demos may not generalize. If the demo task closely matches training patterns, performance can drop on your specific use case.

Context placement matters. Where you put information in a prompt affects how the model weights it.

Fine-tuning shifts attention, not knowledge. Fine-tuning adjusts what patterns the model prioritizes—it can make the model more likely to respond in a particular style, format, or domain vocabulary. But it doesn't add new reasoning capabilities or factual knowledge that wasn't implicit in the base model.

The Takeaway: Attention is a powerful pattern-matching mechanism. Understanding this helps predict where models will excel and where they might struggle.
The Cost Structure

Training a large language model from scratch involves substantial compute costs—tens of millions of dollars for frontier models. But compute is just one component.

You also need large, clean datasets (scarce for most domains), ML engineering talent comfortable with distributed training, infrastructure that can handle the workload, and time for iteration and debugging.

Compute costs get the headlines, but data quality, talent, and iteration time often determine whether training is viable.

What Training Actually Produces

Training compresses patterns from data into model weights. The model learns statistical relationships between tokens—what tends to follow what, in what contexts. This compression is lossy. Information gets generalized, blended, and sometimes lost.

The Fine-Tuning Option

Most organizations don't need to train from scratch—they can adapt existing models through fine-tuning. This is more accessible, but comes with its own tradeoffs.

Fine-tuning shifts focus and style effectively. It's less effective at adding genuinely new capabilities that weren't in the base model.

The RAG Alternative

Retrieval-Augmented Generation takes a different approach: instead of encoding knowledge in weights, keep it external and retrieve relevant information at query time. This is often more practical for proprietary knowledge—cheaper than training, easier to update, more controllable.

The Takeaway: Training economics shape what's viable. Understanding the full cost structure—not just compute—helps identify realistic approaches.
The Benchmark Question

AI models are evaluated on standardized benchmarks—tests measuring specific capabilities like reasoning, coding, or factual recall. Benchmarks are useful for comparison, but they measure what's measurable, which isn't always what matters for a given application.

Benchmarks measure what's measurable, which isn't always what matters for your specific application.

When Benchmarks Mislead

A concrete example: a team I know evaluated models for technical documentation. The smaller, more efficient model scored nearly identically to the larger one on coding benchmarks. In production, the difference became clear.

The larger model correctly inferred that "the system" in paragraph four referred to a specific microservice mentioned three pages earlier. The efficient model treated it as a generic reference and gave plausible but wrong instructions.

Sources of Efficiency Gains

Quantization reduces precision of weights, making models smaller and faster. This works well up to a point, then subtly degrades capability.

Distillation trains smaller models to mimic larger ones. Effective, but the student typically doesn't fully match the teacher on edge cases.

Architecture innovations genuinely do more with less. Real advances, though often incremental.

The Takeaway: Efficiency gains are real, but they come with tradeoffs. Evaluate against your actual requirements, not benchmark headlines.
How Context Works

A context window is the text the model can process when generating a response. It includes system prompts, conversation history, any documents you've included—everything the model can "see" for this particular response.

Models are stateless. Each response is generated fresh from the current context. Previous conversations aren't remembered unless explicitly included.

Bigger Windows, Different Tradeoffs

Context windows have grown substantially—from a few thousand tokens to 100K+ in some models. This sounds like pure improvement, but there are considerations.

Longer contexts are slower and more expensive to process. More importantly, research shows models don't use long contexts uniformly—information in the middle tends to get less attention than content at the beginning and end.

Practical Implications

Be explicit about what's needed. Don't assume the model remembers anything from outside the current context.

Position matters. Put the most important information at the beginning of your context.

More isn't always better. Selective, relevant context often outperforms comprehensive context.

The Takeaway: Context is working memory, not long-term memory. Design systems that explicitly manage what the model sees for each response.
Pillar 2

The Proprietary Data Paradox

The Data Advantage Assumption

Many organizations assume their proprietary data is their AI advantage. "We have twenty years of customer data." "Our operational data is unique." This assumption is understandable—and often incomplete.

Proprietary data frequently has challenges: inconsistent formats, implicit assumptions that made sense to creators but aren't documented, gaps that weren't problems for original use cases but matter for AI applications.

Proprietary data is often an advantage in potential. Reference data is what converts that potential into something usable.

What Reference Data Provides

Taxonomies and ontologies provide standard categorization schemes that enable comparison across different data sources.

Entity registries disambiguate references—connecting 'IBM' and 'International Business Machines' to the same entity.

Relationship schemas define standard ways of expressing connections between entities.

The Multiplier Mechanism

Proprietary data provides facts. Reference data provides relationships. Relationships enable computation that creates insight.

The Takeaway: Reference data multiplies the value of proprietary data by making it connectable and computable. The mapping work is where value gets created.
The Articulation Gap

Ask experts how they make decisions and you'll typically get an incomplete picture. They'll describe the factors they consciously consider. What they often can't articulate: the pattern recognition that happens below conscious awareness.

Experts know things they can't fully articulate. That tacit knowledge is often their most valuable contribution—and the hardest to capture.

The AI Training Problem

AI systems learn from data. If knowledge isn't captured in data, models can't learn it. Tacit knowledge, by definition, isn't in the data—it's in the experts who create and interpret the data.

More Promising Approaches

Decision logging: Capture not just outcomes but the specific inputs that led to them.

Structured disagreement: When experts disagree, exploring why often surfaces tacit criteria neither would articulate unprompted.

Schema co-creation: Building data models with experts forces articulation of what entities and relationships matter.

The Takeaway: Tacit knowledge is real and valuable. Capturing it requires methods designed for knowledge that resists articulation.
How Vector Search Works

Vector embeddings convert text into numerical representations that capture semantic meaning. Similar concepts end up as nearby vectors, enabling "find things like this" queries.

Vectors excel at similarity. But expertise often means connecting things that aren't similar at all.

The Similarity Limitation

Consider a food scientist investigating a failed batch. The relevant connection might be between this batch and one from three years ago that used a different supplier but had similar temperature fluctuations during transport. Nothing about these batches would look "similar" to a vector search.

Knowledge Graphs as Complement

Knowledge graphs excel at exactly what vectors miss: explicit, typed relationships that traverse domains. "This ingredient comes from this supplier who also supplies that product which had this quality issue."

The Takeaway: Vectors and graphs complement each other. Vectors find similar things; graphs traverse relationships. The most powerful systems use both.
The Uniform Confidence Problem

Complex systems tend to present all outputs with equal confidence. A result backed by authoritative data looks the same as one derived from algorithmic inference. Users reasonably trust what's presented—they can't see the uncertainty underneath.

Systems speak with uniform confidence regardless of how well-grounded their outputs are. Interface design determines whether users can see the difference.

Designing the Human-in-the-Loop

Provenance visibility: Every data point traced back to its source.

Constraints as interface elements: Acceptable ranges were visible, not hidden.

Drill-down by default: Every aggregate was explorable.

Visual confidence encoding: Confidence levels had consistent visual treatment.

The Visualization Skepticism Principle

Network visualizations are particularly seductive. A beautiful graph makes patterns feel discovered and real—even when those patterns depend on arbitrary parameter choices.

If a pattern survives across different threshold settings, it's probably real. If it vanishes when you tweak a parameter, it was probably an artifact.

The Takeaway: Interface design determines whether users can appropriately calibrate trust. Make uncertainty visible, not hidden.
Pillar 3

Forward-Looking Governance

How Text Generation Works

Language models are next-token predictors. Given a sequence of text, they generate the most likely continuation based on patterns learned from training data.

Here's the important part: 'most likely' doesn't mean 'true.' The model doesn't have a concept of truth—it has a concept of plausibility.

The model doesn't have a concept of truth—it has a concept of plausibility. Hallucination isn't a malfunction; it's inherent to probabilistic text generation.

Why It Can't Be Eliminated

Hallucination is inherent to probabilistic text generation. Mitigation strategies can reduce frequency but not eliminate it.

The Governance Implication

If hallucination can't be eliminated technically, it becomes a governance problem: how do you manage systems that will sometimes confidently produce incorrect outputs?

Instead of 'how do we fix hallucination?' the better question might be 'for which use cases is this error rate acceptable?'

The Takeaway: Hallucination is a property of probabilistic text generation, not a bug being fixed. Governance is about managing that property, not waiting for it to disappear.
The Real Skill: Knowing When to Stop

The most valuable prompting skill isn't crafting better prompts. It's recognizing when prompting isn't the answer.

If you're on your fifth iteration of a prompt, trying to get reliable behavior for a critical task, that's a signal. The task might not be well-suited for current LLMs.

What Tends to Work

Format specificity: 'Return a JSON object with these fields' eliminates ambiguity about output structure.

Examples over descriptions: Showing what you want often works better than explaining it.

Task decomposition: Breaking complex tasks into steps often improves results.

Constraints Beat Personas

Giving the model a persona ('You are an expert marketer') is popular. It works sometimes, but has a failure mode: the model may confidently perform the persona even outside its actual competence.

Constraints often work better than personas. 'You are an expert' makes the model confident; constraints help it stay appropriately scoped.

The Takeaway: Prompting is interface design between human intent and machine capability. Be specific about format, use examples, decompose complexity—and know when a different approach is needed.
The Readiness Misconception

"AI readiness" typically conjures images of model selection, prompt engineering, and integration architecture. These matter. But they're late-stage concerns that assume something more fundamental: that you know what data you have and whether you can trust it.

AI readiness isn't primarily about AI. It's about whether you can answer a simple question: What data do we have, and can we trust it?

The OODA Loop for Data Governance

OBSERVE: What data exists? Before you can govern data, you need to see it.

ORIENT: Can I trust it? Knowing data exists isn't enough—you need to know its quality.

DECIDE: What should I prioritize? Not all data quality issues matter equally.

ACT: How do I improve it? Action flows from understanding, not guesswork.

The Compounding Effect

Governance work compounds in ways that aren't immediately visible. A searchable data dictionary reduces the time to answer "where does this data come from?" from hours to seconds.

The organizations that will thrive with AI aren't necessarily the ones with the most sophisticated models. They're the ones that have done the unglamorous work of knowing what they have and whether they can trust it.

The Takeaway: AI readiness is governance readiness. The work of observing your data landscape, assessing quality, and building reliable metadata infrastructure isn't preparation for AI—it's the foundation that determines whether AI initiatives succeed.
The Through-Line

If there's one thread connecting these articles, it's this: these systems are more comprehensible than marketing often suggests, and more limited than hype implies. Both things are true simultaneously.

What Changes With Understanding

If you've engaged with this series, you have a framework for evaluating claims. When someone promises transformation, you can ask useful questions: What's the training data? How are they handling context? What's the error rate?

The Infrastructure Question

Organizations that move fast by skipping foundations consistently pay for it later. Ungoverned data creates dependencies that calcify. The correction is always more expensive than doing it right initially.

The fundamentals are more stable than headlines suggest. Understanding them provides durable value across hype cycles.

The Takeaway: Understanding the machinery helps you navigate hype cycles, ask better questions, and make decisions grounded in how these systems actually work.