Article

Tukun.ai: A Semantic-First Data Agent

Tukun.ai is a semantic-first data agent built around governed semantics, multiple data sources, multiple LLMs, runtime skills, and reusable analysis assets.

15 May, 2026

Natural language and structured data have always lived in different worlds.

Business questions usually arrive in natural language: Why did growth slow down? What caused a metric to move? How did different channels perform? But real analysis still has to return to the structured layer: data sources, tables, fields, metric definitions, time grains, filters, access boundaries, and reusable outputs.

Tukun.ai is designed to connect those two layers.

In one sentence:

Tukun.ai is a semantic-first data agent.

It is not a simple ChatBI interface. It is a Data Agent Harness built around semantics, data sources, models, skills, and reusable analysis assets.

Why semantic-first

The hard part of a data agent is not just translating a question into SQL.

In real business work, the same metric can have different definitions, the same dimension can come from different data sources, and the same question can imply different time grains or business boundaries. If semantics are not managed first, even a strong model can answer the wrong question fluently.

That is why Tukun.ai starts from this order:

Manage business semantics.
Connect data sources.
Let the agent orchestrate models and tools.
Preserve analysis outputs as reusable assets.

Metrics, dimensions, entities, and relationships are not treated as temporary prompt text. They are manageable, publishable, and traceable product objects.

Product structure

Tukun.ai is organized into several layers:

Layer	Role
Semantics	Manage metrics, dimensions, entities, relationships, and business definitions
Data	Connect PostgreSQL, files, and other business data sources, then synchronize metadata
Models	Support multiple LLMs and switch models by task instead of binding the product to one provider
Language	Support multilingual usage and carry language into runtime context
Skills	Extend repeatable analysis workflows and domain capabilities
Workbench	Handle questions, analysis, review, follow-up, and reuse
Assets	Preserve cards, charts, dashboards, skills, and semantic definitions

The goal is simple: analysis should not stop at a single answer. It should be something teams can inspect, follow up on, reuse, and improve.

A complete Data Agent Harness

The core of Tukun.ai is not a single page. It is an analysis runtime.

When a user request enters the system, the runtime is responsible for:

resolving intent
assembling prompts and context
selecting available tools and Skills
dispatching the right model
executing semantic queries or analysis tools
shaping the result into a reusable product artifact

This keeps product logic from being scattered across pages. Workbench, semantics, skills, and downstream assets are all organized around the same runtime path.

Semantic workflow

Tukun.ai uses synchronized metadata from data sources as the starting point, then uses LLMs to help generate semantic drafts that can align with MetricFlow structures.

The system does not automatically publish those definitions. The default path is AI-assisted draft generation followed by human review. The system improves the speed of initial modeling, while people keep control over business meaning.

This works well for data teams because:

metadata can enter the system automatically
semantic drafts can be generated with LLM assistance
metrics, dimensions, entities, and relationships remain editable
publish state and version history can be tracked

The point is not to make the model guess business definitions on every request. The point is to let the agent analyze on top of a more stable semantic layer.

Multiple data sources

Enterprise data rarely lives in one place.

Some data is in databases. Some is in files. Some comes from business systems or APIs. Tukun.ai is designed around multiple data sources from the beginning, so different sources can enter one analysis workflow.

In the current architecture, semantic assets and analysis context are scoped by data_source_id. This prevents metric definitions from different data sources from being mixed accidentally, and it gives the product a clear foundation for source-level governance and reuse.

Multiple LLMs

Different models are good at different jobs.

Some are better at reasoning. Some are better at tool use. Some are better for cost control. Some are more stable in specific language scenarios.

Tukun.ai treats models as configurable, governable product capabilities:

multiple providers
multiple models
plan-based model availability
default model and model preference settings
runtime refresh after configuration changes

This matters for a commercial product. Multi-model support is not only an API integration problem. It also touches billing, accounts, quota, cached input, output, and reasoning output.

Prompt Cache-friendly context design

To control long-term usage cost, Tukun.ai uses layered prompt assembly:

Base System Prompt
Core Runtime Rules
Response Contract
Evidence Rules
Shared Memory
Skill Prompts / Skill References
Recent Turns
Turn Context

These sections are grouped as stable, semistable, and volatile.

Stable content stays fixed as much as possible. Semistable content changes with capabilities and task shape. Volatile content stays close to the current turn. This structure makes it easier to benefit from provider Prompt Cache behavior and keeps frequent analysis workflows more cost-efficient.

Multilingual runtime

Multilingual support is not just interface translation.

For a data agent, language affects user questions, tool output, errors, analysis conclusions, and follow-up suggestions. Tukun.ai carries requested_locale into runtime context so prompt assembly and tool output can follow the user’s language environment.

The current product has been shaped around Chinese, English, and Japanese. Future languages should mainly require localized copy and language configuration, not a rewrite of the business workflow.

Skills extension

Beyond built-in analysis capabilities, Tukun.ai supports repeatable workflows through Skills.

Examples include:

industry-specific analysis templates
report generation from fixed templates
PPT generation from data results
team-specific analysis methods
domain-specific data processing and explanation flows

Skills participate in runtime prompt and tool context as capability bundles. They are not just UI shortcuts.

How it differs from traditional BI and generic chat assistants

Comparison	Traditional BI	Generic chat assistant	Tukun.ai
Entry point	Dashboards / reports	Chat box	Semantics + Workbench
Semantic management	Often scattered	Mostly absent	Built in
Data access	Available, but configuration-heavy	Weak	Part of the analysis workflow
Analysis process	Fixed	Temporary	Follow-up, review, and reuse
Result preservation	Dashboard-first	Context-memory dependent	Cards, charts, dashboards, and semantic assets

Tukun.ai is not trying to replace every BI product, and it is not trying to become a generic chat entry point.

It focuses on one path: start from a natural language question, constrain it with governed semantics, execute with tools, and preserve the result as a reusable analysis asset.

Current stage

Tukun.ai already has the core framework in place:

Data Agent Harness
semantic-first workflow
multiple data source support
multi-LLM configuration
Prompt Cache-friendly context layering
multilingual runtime
Skills extension
Workbench and reusable analysis assets

Next, the product will keep improving around real analysis workflows: making semantic modeling steadier, making the analysis path easier to inspect, and making result reuse more natural.

The point of a data agent is not to make a model sound more conversational. The point is to make analysis more reliable, controllable, and cumulative.

That is the direction of Tukun.ai.