feat: Add intelligent auto-router and enhanced integrations
- Add intelligent-router.sh hook for automatic agent routing - Add AUTO-TRIGGER-SUMMARY.md documentation - Add FINAL-INTEGRATION-SUMMARY.md documentation - Complete Prometheus integration (6 commands + 4 tools) - Complete Dexto integration (12 commands + 5 tools) - Enhanced Ralph with access to all agents - Fix /clawd command (removed disable-model-invocation) - Update hooks.json to v5 with intelligent routing - 291 total skills now available - All 21 commands with automatic routing 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
199
dexto/packages/core/src/telemetry/README.md
Normal file
199
dexto/packages/core/src/telemetry/README.md
Normal file
@@ -0,0 +1,199 @@
|
||||
# Telemetry Module
|
||||
|
||||
OpenTelemetry distributed tracing for Dexto agent operations.
|
||||
|
||||
## What It Does
|
||||
|
||||
- **Traces execution flow** across DextoAgent, LLM services, and tool operations
|
||||
- **Captures token usage** for all LLM calls (input/output/total tokens)
|
||||
- **Exports to OTLP-compatible backends** (Jaeger, Grafana, etc.)
|
||||
- **Zero overhead when disabled** - all instrumentation is opt-in
|
||||
|
||||
## Architecture
|
||||
|
||||
### Decorator-Based Instrumentation
|
||||
|
||||
Uses `@InstrumentClass` decorator on critical execution paths:
|
||||
|
||||
- `DextoAgent` - Top-level orchestrator
|
||||
- `VercelLLMService` - LLM operations (all providers via Vercel AI SDK)
|
||||
- `ToolManager` - Tool execution
|
||||
|
||||
**Not decorated** (following selective instrumentation strategy):
|
||||
- Low-level services (MCPManager, SessionManager, PluginManager)
|
||||
- Storage/memory operations (ResourceManager, MemoryManager)
|
||||
|
||||
### Initialization
|
||||
|
||||
Telemetry is initialized in `createAgentServices()` **before** any decorated classes are instantiated:
|
||||
|
||||
```typescript
|
||||
// packages/core/src/utils/service-initializer.ts
|
||||
if (config.telemetry?.enabled) {
|
||||
await Telemetry.init(config.telemetry);
|
||||
}
|
||||
```
|
||||
|
||||
### Agent Switching
|
||||
|
||||
For sequential agent switching, telemetry is shut down before creating the new agent:
|
||||
|
||||
```typescript
|
||||
// packages/cli/src/api/server.ts
|
||||
await Telemetry.shutdownGlobal(); // Old telemetry
|
||||
newAgent = await getDexto().createAgent(agentId); // Fresh telemetry
|
||||
```
|
||||
|
||||
## Configuration
|
||||
|
||||
Enable in your agent config:
|
||||
|
||||
```yaml
|
||||
# agents/my-agent.yml
|
||||
telemetry:
|
||||
enabled: true
|
||||
serviceName: my-dexto-agent
|
||||
tracerName: dexto-tracer
|
||||
export:
|
||||
type: otlp
|
||||
protocol: http
|
||||
endpoint: http://localhost:4318/v1/traces
|
||||
```
|
||||
|
||||
## Testing with Jaeger
|
||||
|
||||
### 1. Start Jaeger
|
||||
|
||||
```bash
|
||||
docker run -d \
|
||||
--name jaeger \
|
||||
-p 16686:16686 \
|
||||
-p 4318:4318 \
|
||||
jaegertracing/all-in-one:latest
|
||||
```
|
||||
|
||||
**Ports:**
|
||||
- `16686` - Jaeger UI (web interface)
|
||||
- `4318` - OTLP HTTP receiver (where Dexto sends traces)
|
||||
|
||||
### 2. Enable Telemetry
|
||||
|
||||
Telemetry is already enabled in `agents/default-agent.yml`. To disable, set `enabled: false`.
|
||||
|
||||
### 3. Run Dexto webUI
|
||||
|
||||
```bash
|
||||
# Run in CLI mode
|
||||
pnpm run dev
|
||||
```
|
||||
|
||||
### 4. Generate Traces
|
||||
|
||||
Send messages through CLI or WebUI to generate traces.
|
||||
|
||||
### 5. View Traces
|
||||
|
||||
1. Open Jaeger UI: http://localhost:16686
|
||||
2. Select service: `dexto-default-agent`
|
||||
3. Click "Find Traces"
|
||||
4. Select an operation: `agent.run`
|
||||
|
||||
### 6. Verify Trace Structure
|
||||
|
||||
Click on a trace to see the span hierarchy:
|
||||
|
||||
```
|
||||
agent.run (20.95s total)
|
||||
├─ agent.maybeGenerateTitle (14.99ms)
|
||||
└─ llm.vercel.completeTask (20.93s)
|
||||
└─ llm.vercel.streamText (20.92s)
|
||||
├─ POST https://api.openai.com/... (10.01s) ← HTTP auto-instrumentation
|
||||
└─ POST https://api.openai.com/... (10.79s) ← HTTP auto-instrumentation
|
||||
```
|
||||
|
||||
**What to verify:**
|
||||
- ✅ Span names use correct prefixes (`agent.`, `llm.vercel.`)
|
||||
- ✅ Span hierarchy shows parent-child relationships
|
||||
- ✅ HTTP auto-instrumentation captures API calls
|
||||
- ✅ Token usage attributes: `gen_ai.usage.input_tokens`, `gen_ai.usage.output_tokens`
|
||||
- ✅ No errors in console logs
|
||||
|
||||
### 7. Cleanup
|
||||
|
||||
```bash
|
||||
docker stop jaeger
|
||||
docker rm jaeger
|
||||
```
|
||||
|
||||
## Module Structure
|
||||
|
||||
```
|
||||
telemetry/
|
||||
├── README.md # This file
|
||||
├── telemetry.ts # Core Telemetry class, SDK initialization
|
||||
├── decorators.ts # @InstrumentClass decorator implementation
|
||||
├── schemas.ts # Zod schemas for telemetry config
|
||||
├── types.ts # TypeScript types for spans and traces
|
||||
├── exporters.ts # CompositeExporter for multi-destination support
|
||||
└── utils.ts # Helper functions
|
||||
```
|
||||
|
||||
## Key Files
|
||||
|
||||
### `telemetry.ts`
|
||||
- `Telemetry.init(config)` - Initialize OpenTelemetry SDK
|
||||
- `Telemetry.shutdownGlobal()` - Shutdown for agent switching
|
||||
- `Telemetry.get()` - Get initialized instance
|
||||
|
||||
### `decorators.ts`
|
||||
- `@InstrumentClass(options)` - Decorator for automatic tracing
|
||||
- `withSpan(spanName, fn, options)` - Manual span creation
|
||||
|
||||
### `exporters.ts`
|
||||
- `CompositeExporter` - Multi-destination exporting with recursive telemetry filtering
|
||||
|
||||
## Adding Telemetry to New Modules
|
||||
|
||||
Use the `@InstrumentClass` decorator on classes in critical execution paths:
|
||||
|
||||
```typescript
|
||||
import { InstrumentClass } from '../telemetry/decorators.js';
|
||||
|
||||
@InstrumentClass({
|
||||
prefix: 'mymodule', // Span prefix: mymodule.methodName
|
||||
excludeMethods: ['helper'] // Methods to skip
|
||||
})
|
||||
export class MyModule {
|
||||
async process(data: string): Promise<void> {
|
||||
// Span automatically created: "mymodule.process"
|
||||
// Add custom attributes to active span:
|
||||
const span = trace.getActiveSpan();
|
||||
if (span) {
|
||||
span.setAttribute('data.length', data.length);
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
**No traces appearing in Jaeger?**
|
||||
1. Verify Jaeger is running: `docker ps | grep jaeger`
|
||||
2. Check endpoint in agent config: `http://localhost:4318/v1/traces`
|
||||
3. Check console for "Telemetry initialized" log
|
||||
4. Verify `enabled: true` in telemetry config
|
||||
|
||||
**Only seeing HTTP GET/POST spans?**
|
||||
- These are from OpenTelemetry's automatic HTTP instrumentation (expected!)
|
||||
- Filter by Operation: `agent.run` to see decorated spans
|
||||
- Click into a trace to see the full hierarchy
|
||||
|
||||
**Build errors?**
|
||||
- Run `pnpm install` if dependencies are missing
|
||||
- Ensure you're on the `telemetry` branch
|
||||
|
||||
## Further Documentation
|
||||
|
||||
- Full feature plan: `/feature-plans/telemetry.md`
|
||||
- Configuration options: See `schemas.ts`
|
||||
- OpenTelemetry docs: https://opentelemetry.io/docs/
|
||||
271
dexto/packages/core/src/telemetry/decorators.ts
Normal file
271
dexto/packages/core/src/telemetry/decorators.ts
Normal file
@@ -0,0 +1,271 @@
|
||||
import {
|
||||
trace,
|
||||
context,
|
||||
SpanStatusCode,
|
||||
SpanKind,
|
||||
propagation,
|
||||
SpanOptions,
|
||||
type BaggageEntry,
|
||||
} from '@opentelemetry/api';
|
||||
import type { IDextoLogger } from '../logger/v2/types.js';
|
||||
import { hasActiveTelemetry, getBaggageValues } from './utils.js';
|
||||
import { safeStringify } from '../utils/safe-stringify.js';
|
||||
|
||||
// Decorator factory that takes optional spanName
|
||||
export function withSpan(options: {
|
||||
spanName?: string;
|
||||
skipIfNoTelemetry?: boolean;
|
||||
spanKind?: SpanKind;
|
||||
tracerName?: string;
|
||||
}): any {
|
||||
return function (
|
||||
_target: unknown,
|
||||
propertyKey: string | symbol,
|
||||
descriptor?: PropertyDescriptor | number
|
||||
) {
|
||||
if (!descriptor || typeof descriptor === 'number') return;
|
||||
|
||||
const originalMethod = descriptor.value as Function;
|
||||
const methodName = String(propertyKey);
|
||||
|
||||
descriptor.value = function (this: unknown, ...args: unknown[]) {
|
||||
// Try to get logger from instance for DI pattern (optional)
|
||||
const logger = (this as any)?.logger as IDextoLogger | undefined;
|
||||
|
||||
// Skip if no telemetry is available and skipIfNoTelemetry is true
|
||||
// Guard against Telemetry.get() throwing if globalThis.__TELEMETRY__ is not yet defined
|
||||
if (
|
||||
options?.skipIfNoTelemetry &&
|
||||
(!globalThis.__TELEMETRY__ || !hasActiveTelemetry(logger))
|
||||
) {
|
||||
return originalMethod.apply(this, args);
|
||||
}
|
||||
const tracer = trace.getTracer(options?.tracerName ?? 'dexto');
|
||||
|
||||
// Determine span name and kind
|
||||
let spanName: string = methodName; // Default spanName
|
||||
let spanKind: SpanKind | undefined;
|
||||
|
||||
if (options) {
|
||||
// options is always an object here due to decorator factory
|
||||
spanName = options.spanName ?? methodName;
|
||||
if (options.spanKind !== undefined) {
|
||||
spanKind = options.spanKind;
|
||||
}
|
||||
}
|
||||
|
||||
// Start the span with optional kind
|
||||
const spanOptions: SpanOptions = {};
|
||||
if (spanKind !== undefined) {
|
||||
spanOptions.kind = spanKind;
|
||||
}
|
||||
const span = tracer.startSpan(spanName, spanOptions);
|
||||
let ctx = trace.setSpan(context.active(), span);
|
||||
|
||||
// Record input arguments as span attributes (sanitized and truncated)
|
||||
args.forEach((arg, index) => {
|
||||
span.setAttribute(`${spanName}.argument.${index}`, safeStringify(arg, 8192));
|
||||
});
|
||||
|
||||
// Extract baggage values from the current context (may include values set by parent spans)
|
||||
const { requestId, componentName, runId, threadId, resourceId, sessionId } =
|
||||
getBaggageValues(ctx);
|
||||
|
||||
// Add all baggage values to span attributes
|
||||
// Set both direct attributes and baggage-prefixed versions for storage schema fallback
|
||||
if (sessionId) {
|
||||
span.setAttribute('sessionId', sessionId);
|
||||
span.setAttribute('baggage.sessionId', sessionId); // Fallback for storage
|
||||
}
|
||||
|
||||
if (requestId) {
|
||||
span.setAttribute('http.request_id', requestId);
|
||||
span.setAttribute('baggage.http.request_id', requestId);
|
||||
}
|
||||
|
||||
if (threadId) {
|
||||
span.setAttribute('threadId', threadId);
|
||||
span.setAttribute('baggage.threadId', threadId);
|
||||
}
|
||||
|
||||
if (resourceId) {
|
||||
span.setAttribute('resourceId', resourceId);
|
||||
span.setAttribute('baggage.resourceId', resourceId);
|
||||
}
|
||||
|
||||
if (runId !== undefined) {
|
||||
span.setAttribute('runId', String(runId));
|
||||
span.setAttribute('baggage.runId', String(runId));
|
||||
}
|
||||
|
||||
if (componentName) {
|
||||
span.setAttribute('componentName', componentName);
|
||||
span.setAttribute('baggage.componentName', componentName);
|
||||
} else if (this && typeof this === 'object') {
|
||||
const contextObj = this as {
|
||||
name?: string;
|
||||
runId?: string;
|
||||
constructor?: { name?: string };
|
||||
};
|
||||
// Prefer instance.name, fallback to constructor.name
|
||||
const inferredName = contextObj.name ?? contextObj.constructor?.name;
|
||||
if (inferredName) {
|
||||
span.setAttribute('componentName', inferredName);
|
||||
}
|
||||
if (contextObj.runId) {
|
||||
span.setAttribute('runId', contextObj.runId);
|
||||
span.setAttribute('baggage.runId', contextObj.runId);
|
||||
}
|
||||
|
||||
// Merge with existing baggage to preserve parent context values
|
||||
const existingBaggage = propagation.getBaggage(ctx);
|
||||
const baggageEntries: Record<string, BaggageEntry> = {};
|
||||
|
||||
// Copy all existing baggage entries to preserve custom baggage
|
||||
if (existingBaggage) {
|
||||
existingBaggage.getAllEntries().forEach(([key, entry]) => {
|
||||
baggageEntries[key] = entry;
|
||||
});
|
||||
}
|
||||
|
||||
// Preserve existing baggage values and metadata
|
||||
if (sessionId !== undefined) {
|
||||
baggageEntries.sessionId = {
|
||||
...baggageEntries.sessionId,
|
||||
value: String(sessionId),
|
||||
};
|
||||
}
|
||||
if (requestId !== undefined) {
|
||||
baggageEntries['http.request_id'] = {
|
||||
...baggageEntries['http.request_id'],
|
||||
value: String(requestId),
|
||||
};
|
||||
}
|
||||
if (threadId !== undefined) {
|
||||
baggageEntries.threadId = {
|
||||
...baggageEntries.threadId,
|
||||
value: String(threadId),
|
||||
};
|
||||
}
|
||||
if (resourceId !== undefined) {
|
||||
baggageEntries.resourceId = {
|
||||
...baggageEntries.resourceId,
|
||||
value: String(resourceId),
|
||||
};
|
||||
}
|
||||
|
||||
// Add new component-specific baggage values
|
||||
if (inferredName !== undefined) {
|
||||
baggageEntries.componentName = {
|
||||
...baggageEntries.componentName,
|
||||
value: String(inferredName),
|
||||
};
|
||||
}
|
||||
if (contextObj.runId !== undefined) {
|
||||
baggageEntries.runId = {
|
||||
...baggageEntries.runId,
|
||||
value: String(contextObj.runId),
|
||||
};
|
||||
}
|
||||
|
||||
if (Object.keys(baggageEntries).length > 0) {
|
||||
ctx = propagation.setBaggage(ctx, propagation.createBaggage(baggageEntries));
|
||||
}
|
||||
}
|
||||
|
||||
let result: unknown;
|
||||
try {
|
||||
// Call the original method within the context
|
||||
result = context.with(ctx, () => originalMethod.apply(this, args));
|
||||
|
||||
// Handle promises
|
||||
if (result instanceof Promise) {
|
||||
return result
|
||||
.then((resolvedValue) => {
|
||||
span.setAttribute(
|
||||
`${spanName}.result`,
|
||||
safeStringify(resolvedValue, 8192)
|
||||
);
|
||||
return resolvedValue;
|
||||
})
|
||||
.catch((error) => {
|
||||
span.recordException(error);
|
||||
span.setStatus({
|
||||
code: SpanStatusCode.ERROR,
|
||||
message: error?.toString(),
|
||||
});
|
||||
throw error;
|
||||
})
|
||||
.finally(() => {
|
||||
span.end();
|
||||
});
|
||||
}
|
||||
|
||||
// Record result for non-promise returns (sanitized and truncated)
|
||||
span.setAttribute(`${spanName}.result`, safeStringify(result, 8192));
|
||||
// Return regular results
|
||||
return result;
|
||||
} catch (error) {
|
||||
// Try to use instance logger if available (DI pattern)
|
||||
const logger = (this as any)?.logger as IDextoLogger | undefined;
|
||||
logger?.error(
|
||||
`withSpan: Error in method '${methodName}': ${error instanceof Error ? error.message : String(error)}`,
|
||||
{ error }
|
||||
);
|
||||
span.setStatus({
|
||||
code: SpanStatusCode.ERROR,
|
||||
message: error instanceof Error ? error.message : 'Unknown error',
|
||||
});
|
||||
if (error instanceof Error) {
|
||||
span.recordException(error);
|
||||
}
|
||||
throw error;
|
||||
} finally {
|
||||
// End span for non-promise returns
|
||||
if (!(result instanceof Promise)) {
|
||||
span.end();
|
||||
}
|
||||
}
|
||||
};
|
||||
|
||||
return descriptor;
|
||||
};
|
||||
}
|
||||
|
||||
// class-telemetry.decorator.ts
|
||||
export function InstrumentClass(options?: {
|
||||
prefix?: string;
|
||||
spanKind?: SpanKind;
|
||||
excludeMethods?: string[];
|
||||
methodFilter?: (methodName: string) => boolean;
|
||||
tracerName?: string;
|
||||
}) {
|
||||
return function <T extends { new (...args: any[]): {} }>(target: T) {
|
||||
const methods = Object.getOwnPropertyNames(target.prototype);
|
||||
methods.forEach((method) => {
|
||||
// Skip excluded methods
|
||||
if (options?.excludeMethods?.includes(method) || method === 'constructor') {
|
||||
return;
|
||||
}
|
||||
// Apply method filter if provided
|
||||
if (options?.methodFilter && !options.methodFilter(method)) return;
|
||||
|
||||
const descriptor = Object.getOwnPropertyDescriptor(target.prototype, method);
|
||||
if (descriptor && typeof descriptor.value === 'function') {
|
||||
Object.defineProperty(
|
||||
target.prototype,
|
||||
method,
|
||||
withSpan({
|
||||
spanName: options?.prefix ? `${options.prefix}.${method}` : method,
|
||||
skipIfNoTelemetry: true,
|
||||
spanKind: options?.spanKind || SpanKind.INTERNAL,
|
||||
...(options?.tracerName !== undefined && {
|
||||
tracerName: options.tracerName,
|
||||
}),
|
||||
})(target, method, descriptor)
|
||||
);
|
||||
}
|
||||
});
|
||||
return target;
|
||||
};
|
||||
}
|
||||
19
dexto/packages/core/src/telemetry/error-codes.ts
Normal file
19
dexto/packages/core/src/telemetry/error-codes.ts
Normal file
@@ -0,0 +1,19 @@
|
||||
/**
|
||||
* Telemetry-specific error codes
|
||||
* Covers initialization, dependencies, and export operations
|
||||
*/
|
||||
export enum TelemetryErrorCode {
|
||||
// Initialization errors
|
||||
INITIALIZATION_FAILED = 'telemetry_initialization_failed',
|
||||
NOT_INITIALIZED = 'telemetry_not_initialized',
|
||||
|
||||
// Dependency errors
|
||||
DEPENDENCY_NOT_INSTALLED = 'telemetry_dependency_not_installed',
|
||||
EXPORTER_DEPENDENCY_NOT_INSTALLED = 'telemetry_exporter_dependency_not_installed',
|
||||
|
||||
// Configuration errors
|
||||
INVALID_CONFIG = 'telemetry_invalid_config',
|
||||
|
||||
// Shutdown errors
|
||||
SHUTDOWN_FAILED = 'telemetry_shutdown_failed',
|
||||
}
|
||||
91
dexto/packages/core/src/telemetry/errors.ts
Normal file
91
dexto/packages/core/src/telemetry/errors.ts
Normal file
@@ -0,0 +1,91 @@
|
||||
import { DextoRuntimeError } from '../errors/DextoRuntimeError.js';
|
||||
import { ErrorScope, ErrorType } from '../errors/types.js';
|
||||
import { TelemetryErrorCode } from './error-codes.js';
|
||||
|
||||
/**
|
||||
* Telemetry error factory with typed methods for creating telemetry-specific errors
|
||||
* Each method creates a properly typed error with TELEMETRY scope
|
||||
*/
|
||||
export class TelemetryError {
|
||||
/**
|
||||
* Required OpenTelemetry dependencies not installed
|
||||
*/
|
||||
static dependencyNotInstalled(packages: string[]): DextoRuntimeError {
|
||||
return new DextoRuntimeError(
|
||||
TelemetryErrorCode.DEPENDENCY_NOT_INSTALLED,
|
||||
ErrorScope.TELEMETRY,
|
||||
ErrorType.USER,
|
||||
'Telemetry is enabled but required OpenTelemetry packages are not installed.',
|
||||
{
|
||||
packages,
|
||||
hint: `Install with: npm install ${packages.join(' ')}`,
|
||||
recovery: 'Or disable telemetry by setting enabled: false in your configuration.',
|
||||
}
|
||||
);
|
||||
}
|
||||
|
||||
/**
|
||||
* Specific exporter dependency not installed (gRPC or HTTP)
|
||||
*/
|
||||
static exporterDependencyNotInstalled(
|
||||
exporterType: 'grpc' | 'http',
|
||||
packageName: string
|
||||
): DextoRuntimeError {
|
||||
return new DextoRuntimeError(
|
||||
TelemetryErrorCode.EXPORTER_DEPENDENCY_NOT_INSTALLED,
|
||||
ErrorScope.TELEMETRY,
|
||||
ErrorType.USER,
|
||||
`OTLP ${exporterType.toUpperCase()} exporter configured but '${packageName}' is not installed.`,
|
||||
{
|
||||
exporterType,
|
||||
packageName,
|
||||
hint: `Install with: npm install ${packageName}`,
|
||||
}
|
||||
);
|
||||
}
|
||||
|
||||
/**
|
||||
* Telemetry initialization failed
|
||||
*/
|
||||
static initializationFailed(reason: string, originalError?: unknown): DextoRuntimeError {
|
||||
return new DextoRuntimeError(
|
||||
TelemetryErrorCode.INITIALIZATION_FAILED,
|
||||
ErrorScope.TELEMETRY,
|
||||
ErrorType.SYSTEM,
|
||||
`Failed to initialize telemetry: ${reason}`,
|
||||
{
|
||||
reason,
|
||||
originalError:
|
||||
originalError instanceof Error ? originalError.message : String(originalError),
|
||||
}
|
||||
);
|
||||
}
|
||||
|
||||
/**
|
||||
* Telemetry not initialized when expected
|
||||
*/
|
||||
static notInitialized(): DextoRuntimeError {
|
||||
return new DextoRuntimeError(
|
||||
TelemetryErrorCode.NOT_INITIALIZED,
|
||||
ErrorScope.TELEMETRY,
|
||||
ErrorType.USER,
|
||||
'Telemetry not initialized. Call Telemetry.init() first.',
|
||||
{
|
||||
hint: 'Ensure telemetry is initialized before accessing the global instance.',
|
||||
}
|
||||
);
|
||||
}
|
||||
|
||||
/**
|
||||
* Telemetry shutdown failed (non-blocking warning)
|
||||
*/
|
||||
static shutdownFailed(reason: string): DextoRuntimeError {
|
||||
return new DextoRuntimeError(
|
||||
TelemetryErrorCode.SHUTDOWN_FAILED,
|
||||
ErrorScope.TELEMETRY,
|
||||
ErrorType.SYSTEM,
|
||||
`Telemetry shutdown failed: ${reason}`,
|
||||
{ reason }
|
||||
);
|
||||
}
|
||||
}
|
||||
131
dexto/packages/core/src/telemetry/exporters.ts
Normal file
131
dexto/packages/core/src/telemetry/exporters.ts
Normal file
@@ -0,0 +1,131 @@
|
||||
import { ExportResultCode } from '@opentelemetry/core';
|
||||
import type { ExportResult } from '@opentelemetry/core';
|
||||
import type { ReadableSpan, SpanExporter } from '@opentelemetry/sdk-trace-base';
|
||||
|
||||
/**
|
||||
* Normalizes URL paths for consistent comparison
|
||||
* Handles both full URLs and path-only strings
|
||||
* @param url - URL or path to normalize
|
||||
* @returns Normalized lowercase path without trailing slash
|
||||
*/
|
||||
function normalizeUrlPath(url: string): string {
|
||||
try {
|
||||
const parsedUrl = new URL(url);
|
||||
let pathname = parsedUrl.pathname.toLowerCase().trim();
|
||||
if (pathname.endsWith('/')) {
|
||||
pathname = pathname.slice(0, -1);
|
||||
}
|
||||
return pathname;
|
||||
} catch (_e) {
|
||||
// If it's not a valid URL, treat it as a path and normalize
|
||||
let path = url.toLowerCase().trim();
|
||||
if (path.endsWith('/')) {
|
||||
path = path.slice(0, -1);
|
||||
}
|
||||
return path;
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* CompositeExporter wraps multiple span exporters and provides two key features:
|
||||
*
|
||||
* 1. **Multi-exporter support**: Exports spans to multiple destinations in parallel
|
||||
* (e.g., console for development + OTLP for production monitoring)
|
||||
*
|
||||
* 2. **Recursive telemetry filtering**: Prevents telemetry infinity loops by filtering
|
||||
* out spans from `/api/telemetry` endpoints. Without this, telemetry API calls would
|
||||
* generate spans, which would be exported via HTTP to `/api/telemetry`, generating
|
||||
* more spans, creating an infinite loop.
|
||||
*
|
||||
* @example
|
||||
* ```typescript
|
||||
* const exporter = new CompositeExporter([
|
||||
* new ConsoleSpanExporter(),
|
||||
* new OTLPHttpExporter({ url: 'http://localhost:4318/v1/traces' })
|
||||
* ]);
|
||||
* ```
|
||||
*/
|
||||
export class CompositeExporter implements SpanExporter {
|
||||
private exporters: SpanExporter[];
|
||||
|
||||
constructor(exporters: SpanExporter[]) {
|
||||
this.exporters = exporters;
|
||||
}
|
||||
|
||||
export(spans: ReadableSpan[], resultCallback: (result: ExportResult) => void): void {
|
||||
// First collect all traceIds from telemetry endpoint spans
|
||||
const telemetryTraceIds = new Set(
|
||||
spans
|
||||
.filter((span) => {
|
||||
const attrs = span.attributes || {};
|
||||
const relevantHttpAttributes = [
|
||||
attrs['http.target'],
|
||||
attrs['http.route'],
|
||||
attrs['http.url'],
|
||||
attrs['url.path'],
|
||||
attrs['http.request_path'],
|
||||
];
|
||||
|
||||
const isTelemetrySpan = relevantHttpAttributes.some((attr) => {
|
||||
if (typeof attr === 'string') {
|
||||
const normalizedPath = normalizeUrlPath(attr);
|
||||
// Check for exact match or path prefix
|
||||
return (
|
||||
normalizedPath === '/api/telemetry' ||
|
||||
normalizedPath.startsWith('/api/telemetry/')
|
||||
);
|
||||
}
|
||||
return false;
|
||||
});
|
||||
return isTelemetrySpan;
|
||||
})
|
||||
.map((span) => span.spanContext().traceId)
|
||||
);
|
||||
|
||||
// Then filter out any spans that have those traceIds
|
||||
const filteredSpans = spans.filter(
|
||||
(span) => !telemetryTraceIds.has(span.spanContext().traceId)
|
||||
);
|
||||
|
||||
// Return early if no spans to export
|
||||
if (filteredSpans.length === 0) {
|
||||
resultCallback({ code: ExportResultCode.SUCCESS });
|
||||
return;
|
||||
}
|
||||
|
||||
void Promise.all(
|
||||
this.exporters.map(
|
||||
(exporter) =>
|
||||
new Promise<ExportResult>((resolve) => {
|
||||
if (exporter.export) {
|
||||
exporter.export(filteredSpans, resolve);
|
||||
} else {
|
||||
resolve({ code: ExportResultCode.FAILED });
|
||||
}
|
||||
})
|
||||
)
|
||||
)
|
||||
.then((results) => {
|
||||
const hasError = results.some((r) => r.code === ExportResultCode.FAILED);
|
||||
resultCallback({
|
||||
code: hasError ? ExportResultCode.FAILED : ExportResultCode.SUCCESS,
|
||||
});
|
||||
})
|
||||
.catch((error) => {
|
||||
console.error('[CompositeExporter] Export error:', error);
|
||||
resultCallback({ code: ExportResultCode.FAILED });
|
||||
});
|
||||
}
|
||||
|
||||
shutdown(): Promise<void> {
|
||||
return Promise.all(this.exporters.map((e) => e.shutdown?.() ?? Promise.resolve())).then(
|
||||
() => undefined
|
||||
);
|
||||
}
|
||||
|
||||
forceFlush(): Promise<void> {
|
||||
return Promise.all(this.exporters.map((e) => e.forceFlush?.() ?? Promise.resolve())).then(
|
||||
() => undefined
|
||||
);
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,148 @@
|
||||
/**
|
||||
* Integration test for HTTP instrumentation.
|
||||
*
|
||||
* This test verifies that OpenTelemetry's HTTP/fetch instrumentation is working correctly.
|
||||
* It makes actual HTTP calls and verifies that spans are created for them.
|
||||
*
|
||||
* This is critical for ensuring that LLM API calls (which use fetch) are traced.
|
||||
*
|
||||
* NOTE: This test sets up OpenTelemetry SDK directly (not via Telemetry class) to verify
|
||||
* that the specific instrumentations (http + undici) correctly instrument fetch() calls.
|
||||
* This mirrors the production setup in telemetry.ts.
|
||||
*/
|
||||
import { describe, test, expect, afterAll, beforeAll } from 'vitest';
|
||||
import { NodeSDK } from '@opentelemetry/sdk-node';
|
||||
import { InMemorySpanExporter, SimpleSpanProcessor } from '@opentelemetry/sdk-trace-base';
|
||||
import { Resource } from '@opentelemetry/resources';
|
||||
import { ATTR_SERVICE_NAME } from '@opentelemetry/semantic-conventions';
|
||||
import { HttpInstrumentation } from '@opentelemetry/instrumentation-http';
|
||||
import { UndiciInstrumentation } from '@opentelemetry/instrumentation-undici';
|
||||
|
||||
describe('HTTP Instrumentation', () => {
|
||||
let serverPort: number;
|
||||
let memoryExporter: InMemorySpanExporter;
|
||||
let sdk: NodeSDK;
|
||||
let server: Awaited<ReturnType<typeof import('http').createServer>>;
|
||||
|
||||
beforeAll(async () => {
|
||||
// Create in-memory exporter
|
||||
memoryExporter = new InMemorySpanExporter();
|
||||
|
||||
// Initialize OpenTelemetry SDK directly with specific instrumentations
|
||||
// This mirrors the production setup in telemetry.ts
|
||||
sdk = new NodeSDK({
|
||||
resource: new Resource({
|
||||
[ATTR_SERVICE_NAME]: 'http-instrumentation-test',
|
||||
}),
|
||||
spanProcessor: new SimpleSpanProcessor(memoryExporter) as any,
|
||||
instrumentations: [new HttpInstrumentation(), new UndiciInstrumentation()],
|
||||
});
|
||||
|
||||
await sdk.start();
|
||||
|
||||
// NOW import http and create the server (after instrumentation is set up)
|
||||
const http = await import('http');
|
||||
server = http.createServer((req, res) => {
|
||||
res.writeHead(200, { 'Content-Type': 'application/json' });
|
||||
res.end(JSON.stringify({ message: 'ok', path: req.url }));
|
||||
});
|
||||
|
||||
await new Promise<void>((resolve) => {
|
||||
server.listen(0, '127.0.0.1', () => {
|
||||
const addr = server.address();
|
||||
if (addr && typeof addr === 'object') {
|
||||
serverPort = addr.port;
|
||||
}
|
||||
resolve();
|
||||
});
|
||||
});
|
||||
});
|
||||
|
||||
afterAll(async () => {
|
||||
// Close server first
|
||||
if (server) {
|
||||
await new Promise<void>((resolve, reject) => {
|
||||
server.close((err) => (err ? reject(err) : resolve()));
|
||||
});
|
||||
}
|
||||
|
||||
// Then shutdown SDK
|
||||
if (sdk) {
|
||||
await sdk.shutdown();
|
||||
}
|
||||
});
|
||||
|
||||
test('fetch() calls are instrumented and create HTTP spans', async () => {
|
||||
// Clear any previous spans
|
||||
memoryExporter.reset();
|
||||
|
||||
// Make a fetch call - this should be instrumented by undici instrumentation
|
||||
// (Node.js 18+ uses undici internally for fetch())
|
||||
const url = `http://127.0.0.1:${serverPort}/test-fetch-endpoint`;
|
||||
const response = await fetch(url);
|
||||
const data = await response.json();
|
||||
expect(data.message).toBe('ok');
|
||||
|
||||
// Give time for async span processing
|
||||
await new Promise((resolve) => setTimeout(resolve, 500));
|
||||
|
||||
// Check that spans were created
|
||||
const spans = memoryExporter.getFinishedSpans();
|
||||
|
||||
// We should have at least one HTTP span
|
||||
const httpSpans = spans.filter((span) => {
|
||||
const name = span.name.toLowerCase();
|
||||
const attrs = span.attributes;
|
||||
return (
|
||||
name.includes('get') ||
|
||||
name.includes('http') ||
|
||||
name.includes('fetch') ||
|
||||
attrs['http.method'] === 'GET' ||
|
||||
attrs['http.request.method'] === 'GET'
|
||||
);
|
||||
});
|
||||
|
||||
expect(httpSpans.length).toBeGreaterThan(0);
|
||||
|
||||
// Verify the span has expected HTTP attributes
|
||||
const httpSpan = httpSpans[0]!;
|
||||
const attrs = httpSpan.attributes;
|
||||
|
||||
// Should have URL-related attributes
|
||||
expect(attrs['url.full'] || attrs['http.url'] || attrs['http.target']).toBeDefined();
|
||||
|
||||
// Should have method attribute
|
||||
expect(attrs['http.request.method'] || attrs['http.method']).toBe('GET');
|
||||
|
||||
// Should have status code
|
||||
expect(attrs['http.response.status_code'] || attrs['http.status_code']).toBe(200);
|
||||
});
|
||||
|
||||
test('multiple fetch() calls create multiple spans', async () => {
|
||||
// Clear any previous spans
|
||||
memoryExporter.reset();
|
||||
|
||||
// Make multiple fetch calls
|
||||
const urls = [
|
||||
`http://127.0.0.1:${serverPort}/endpoint-1`,
|
||||
`http://127.0.0.1:${serverPort}/endpoint-2`,
|
||||
`http://127.0.0.1:${serverPort}/endpoint-3`,
|
||||
];
|
||||
|
||||
await Promise.all(urls.map((url) => fetch(url)));
|
||||
|
||||
// Give time for async span processing
|
||||
await new Promise((resolve) => setTimeout(resolve, 500));
|
||||
|
||||
// Check that spans were created
|
||||
const spans = memoryExporter.getFinishedSpans();
|
||||
|
||||
// Should have at least 3 spans (one for each request)
|
||||
const httpSpans = spans.filter((span) => {
|
||||
const attrs = span.attributes;
|
||||
return attrs['http.request.method'] === 'GET' || attrs['http.method'] === 'GET';
|
||||
});
|
||||
|
||||
expect(httpSpans.length).toBeGreaterThanOrEqual(3);
|
||||
});
|
||||
});
|
||||
1
dexto/packages/core/src/telemetry/index.ts
Normal file
1
dexto/packages/core/src/telemetry/index.ts
Normal file
@@ -0,0 +1 @@
|
||||
export { Telemetry } from './telemetry.js';
|
||||
50
dexto/packages/core/src/telemetry/schemas.ts
Normal file
50
dexto/packages/core/src/telemetry/schemas.ts
Normal file
@@ -0,0 +1,50 @@
|
||||
import { z } from 'zod';
|
||||
|
||||
export const OtelConfigurationSchema = z.object({
|
||||
serviceName: z.string().optional(),
|
||||
enabled: z.boolean().optional(),
|
||||
tracerName: z.string().optional(),
|
||||
// TODO (Telemetry): Implement sampling support in Phase 5
|
||||
// Currently sampling schema is defined but not implemented in telemetry.ts
|
||||
// See feature-plans/telemetry.md Phase 5 for implementation details
|
||||
// sampling: z
|
||||
// .discriminatedUnion('type', [
|
||||
// z.object({
|
||||
// type: z.literal('ratio'),
|
||||
// probability: z.number().min(0).max(1),
|
||||
// }),
|
||||
// z.object({
|
||||
// type: z.literal('always_on'),
|
||||
// }),
|
||||
// z.object({
|
||||
// type: z.literal('always_off'),
|
||||
// }),
|
||||
// z.object({
|
||||
// type: z.literal('parent_based'),
|
||||
// root: z.object({
|
||||
// probability: z.number().min(0).max(1),
|
||||
// }),
|
||||
// }),
|
||||
// ])
|
||||
// .optional(),
|
||||
export: z
|
||||
.union([
|
||||
z.object({
|
||||
type: z.literal('otlp'),
|
||||
protocol: z.enum(['grpc', 'http']).optional(),
|
||||
endpoint: z
|
||||
.union([
|
||||
z.string().url(),
|
||||
z.string().regex(/^[\w.-]+:\d+$/), // host:port
|
||||
])
|
||||
.optional(),
|
||||
headers: z.record(z.string()).optional(),
|
||||
}),
|
||||
z.object({
|
||||
type: z.literal('console'),
|
||||
}),
|
||||
])
|
||||
.optional(),
|
||||
});
|
||||
|
||||
export type OtelConfiguration = z.output<typeof OtelConfigurationSchema>;
|
||||
271
dexto/packages/core/src/telemetry/telemetry.test.ts
Normal file
271
dexto/packages/core/src/telemetry/telemetry.test.ts
Normal file
@@ -0,0 +1,271 @@
|
||||
import { describe, test, expect, afterEach } from 'vitest';
|
||||
import { Telemetry } from './telemetry.js';
|
||||
import type { OtelConfiguration } from './schemas.js';
|
||||
|
||||
describe.sequential('Telemetry Core', () => {
|
||||
// Clean up after each test to prevent state leakage
|
||||
afterEach(async () => {
|
||||
// Force clear global state
|
||||
if (Telemetry.hasGlobalInstance()) {
|
||||
await Telemetry.shutdownGlobal();
|
||||
}
|
||||
// Longer delay to ensure cleanup completes and providers are unregistered
|
||||
await new Promise((resolve) => setTimeout(resolve, 100));
|
||||
});
|
||||
|
||||
describe('Initialization', () => {
|
||||
test('init() with enabled=true creates telemetry instance', async () => {
|
||||
const config: OtelConfiguration = {
|
||||
enabled: true,
|
||||
serviceName: 'test-service',
|
||||
export: { type: 'console' },
|
||||
};
|
||||
|
||||
const telemetry = await Telemetry.init(config);
|
||||
|
||||
expect(telemetry).toBeDefined();
|
||||
expect(telemetry.isInitialized()).toBe(true);
|
||||
expect(telemetry.name).toBe('test-service');
|
||||
expect(Telemetry.hasGlobalInstance()).toBe(true);
|
||||
});
|
||||
|
||||
test('init() with enabled=false creates instance but does not initialize SDK', async () => {
|
||||
const config: OtelConfiguration = {
|
||||
enabled: false,
|
||||
serviceName: 'test-service',
|
||||
};
|
||||
|
||||
const telemetry = await Telemetry.init(config);
|
||||
|
||||
expect(telemetry).toBeDefined();
|
||||
expect(telemetry.isInitialized()).toBe(false);
|
||||
expect(Telemetry.hasGlobalInstance()).toBe(true);
|
||||
});
|
||||
|
||||
test('init() with console exporter works', async () => {
|
||||
const config: OtelConfiguration = {
|
||||
enabled: true,
|
||||
export: { type: 'console' },
|
||||
};
|
||||
|
||||
const telemetry = await Telemetry.init(config);
|
||||
|
||||
expect(telemetry.isInitialized()).toBe(true);
|
||||
});
|
||||
|
||||
test('init() with otlp-http exporter works', async () => {
|
||||
const config: OtelConfiguration = {
|
||||
enabled: true,
|
||||
export: {
|
||||
type: 'otlp',
|
||||
protocol: 'http',
|
||||
endpoint: 'http://localhost:4318/v1/traces',
|
||||
},
|
||||
};
|
||||
|
||||
const telemetry = await Telemetry.init(config);
|
||||
|
||||
expect(telemetry.isInitialized()).toBe(true);
|
||||
});
|
||||
|
||||
test('init() with otlp-grpc exporter works', async () => {
|
||||
const config: OtelConfiguration = {
|
||||
enabled: true,
|
||||
export: {
|
||||
type: 'otlp',
|
||||
protocol: 'grpc',
|
||||
endpoint: 'http://localhost:4317',
|
||||
},
|
||||
};
|
||||
|
||||
const telemetry = await Telemetry.init(config);
|
||||
|
||||
expect(telemetry.isInitialized()).toBe(true);
|
||||
});
|
||||
|
||||
test('init() is idempotent - returns same instance on subsequent calls', async () => {
|
||||
const config: OtelConfiguration = {
|
||||
enabled: true,
|
||||
serviceName: 'test-service',
|
||||
export: { type: 'console' },
|
||||
};
|
||||
|
||||
const telemetry1 = await Telemetry.init(config);
|
||||
const telemetry2 = await Telemetry.init(config);
|
||||
const telemetry3 = await Telemetry.init({ enabled: false }); // Different config
|
||||
|
||||
// Should return the same instance regardless of config
|
||||
expect(telemetry1).toBe(telemetry2);
|
||||
expect(telemetry2).toBe(telemetry3);
|
||||
});
|
||||
|
||||
test('init() is race-safe - concurrent calls return same instance', async () => {
|
||||
const config: OtelConfiguration = {
|
||||
enabled: true,
|
||||
serviceName: 'test-service',
|
||||
export: { type: 'console' },
|
||||
};
|
||||
|
||||
// Start multiple init calls concurrently
|
||||
const [telemetry1, telemetry2, telemetry3] = await Promise.all([
|
||||
Telemetry.init(config),
|
||||
Telemetry.init(config),
|
||||
Telemetry.init(config),
|
||||
]);
|
||||
|
||||
// All should return the same instance
|
||||
expect(telemetry1).toBe(telemetry2);
|
||||
expect(telemetry2).toBe(telemetry3);
|
||||
expect(Telemetry.hasGlobalInstance()).toBe(true);
|
||||
});
|
||||
|
||||
test('get() throws when not initialized', () => {
|
||||
expect(() => Telemetry.get()).toThrow('Telemetry not initialized');
|
||||
});
|
||||
|
||||
test('get() returns instance after initialization', async () => {
|
||||
const config: OtelConfiguration = {
|
||||
enabled: true,
|
||||
export: { type: 'console' },
|
||||
};
|
||||
|
||||
const telemetry = await Telemetry.init(config);
|
||||
const retrieved = Telemetry.get();
|
||||
|
||||
expect(retrieved).toBe(telemetry);
|
||||
});
|
||||
|
||||
test('hasGlobalInstance() returns correct state', async () => {
|
||||
expect(Telemetry.hasGlobalInstance()).toBe(false);
|
||||
|
||||
await Telemetry.init({ enabled: true, export: { type: 'console' } });
|
||||
expect(Telemetry.hasGlobalInstance()).toBe(true);
|
||||
|
||||
await Telemetry.shutdownGlobal();
|
||||
expect(Telemetry.hasGlobalInstance()).toBe(false);
|
||||
});
|
||||
});
|
||||
|
||||
describe('Shutdown', () => {
|
||||
test('shutdownGlobal() clears global instance', async () => {
|
||||
await Telemetry.init({ enabled: true, export: { type: 'console' } });
|
||||
expect(Telemetry.hasGlobalInstance()).toBe(true);
|
||||
|
||||
await Telemetry.shutdownGlobal();
|
||||
expect(Telemetry.hasGlobalInstance()).toBe(false);
|
||||
});
|
||||
|
||||
test('shutdownGlobal() allows re-initialization', async () => {
|
||||
// First initialization
|
||||
const telemetry1 = await Telemetry.init({
|
||||
enabled: true,
|
||||
serviceName: 'service-1',
|
||||
export: { type: 'console' },
|
||||
});
|
||||
expect(telemetry1.name).toBe('service-1');
|
||||
|
||||
// Shutdown
|
||||
await Telemetry.shutdownGlobal();
|
||||
|
||||
// Second initialization with different config
|
||||
const telemetry2 = await Telemetry.init({
|
||||
enabled: true,
|
||||
serviceName: 'service-2',
|
||||
export: { type: 'console' },
|
||||
});
|
||||
expect(telemetry2.name).toBe('service-2');
|
||||
expect(telemetry2).not.toBe(telemetry1);
|
||||
});
|
||||
|
||||
test('shutdownGlobal() is safe to call when not initialized', async () => {
|
||||
expect(Telemetry.hasGlobalInstance()).toBe(false);
|
||||
await expect(Telemetry.shutdownGlobal()).resolves.not.toThrow();
|
||||
});
|
||||
|
||||
test('shutdown() on instance clears isInitialized flag', async () => {
|
||||
const telemetry = await Telemetry.init({
|
||||
enabled: true,
|
||||
export: { type: 'console' },
|
||||
});
|
||||
expect(telemetry.isInitialized()).toBe(true);
|
||||
|
||||
await telemetry.shutdown();
|
||||
expect(telemetry.isInitialized()).toBe(false);
|
||||
});
|
||||
});
|
||||
|
||||
// Note: Signal handler tests removed - they are implementation details
|
||||
// that are difficult to test reliably with mocks. Signal handlers are
|
||||
// manually verified to work correctly (process cleanup on SIGTERM/SIGINT).
|
||||
|
||||
describe('Agent Switching', () => {
|
||||
test('supports sequential agent switching with different configs', async () => {
|
||||
// Agent 1
|
||||
const telemetry1 = await Telemetry.init({
|
||||
enabled: true,
|
||||
serviceName: 'agent-1',
|
||||
export: { type: 'console' },
|
||||
});
|
||||
expect(telemetry1.name).toBe('agent-1');
|
||||
expect(Telemetry.hasGlobalInstance()).toBe(true);
|
||||
|
||||
// Shutdown agent 1
|
||||
await Telemetry.shutdownGlobal();
|
||||
expect(Telemetry.hasGlobalInstance()).toBe(false);
|
||||
|
||||
// Agent 2 with different config
|
||||
const telemetry2 = await Telemetry.init({
|
||||
enabled: true,
|
||||
serviceName: 'agent-2',
|
||||
export: {
|
||||
type: 'otlp',
|
||||
protocol: 'http',
|
||||
endpoint: 'http://different:4318',
|
||||
},
|
||||
});
|
||||
expect(telemetry2.name).toBe('agent-2');
|
||||
expect(telemetry2).not.toBe(telemetry1);
|
||||
|
||||
// Shutdown agent 2
|
||||
await Telemetry.shutdownGlobal();
|
||||
|
||||
// Agent 3 - telemetry disabled
|
||||
const telemetry3 = await Telemetry.init({
|
||||
enabled: false,
|
||||
});
|
||||
expect(telemetry3.isInitialized()).toBe(false);
|
||||
expect(telemetry3).not.toBe(telemetry1);
|
||||
expect(telemetry3).not.toBe(telemetry2);
|
||||
});
|
||||
});
|
||||
|
||||
describe('Static Methods', () => {
|
||||
test('getActiveSpan() returns undefined when no active span', () => {
|
||||
const span = Telemetry.getActiveSpan();
|
||||
expect(span).toBeUndefined();
|
||||
});
|
||||
|
||||
test('setBaggage() creates new context with baggage', () => {
|
||||
const baggage = {
|
||||
sessionId: { value: 'test-session-123' },
|
||||
};
|
||||
|
||||
const newCtx = Telemetry.setBaggage(baggage);
|
||||
expect(newCtx).toBeDefined();
|
||||
});
|
||||
|
||||
test('withContext() executes function in given context', () => {
|
||||
const baggage = {
|
||||
testKey: { value: 'testValue' },
|
||||
};
|
||||
const ctx = Telemetry.setBaggage(baggage);
|
||||
|
||||
let executed = false;
|
||||
Telemetry.withContext(ctx, () => {
|
||||
executed = true;
|
||||
});
|
||||
|
||||
expect(executed).toBe(true);
|
||||
});
|
||||
});
|
||||
});
|
||||
361
dexto/packages/core/src/telemetry/telemetry.ts
Normal file
361
dexto/packages/core/src/telemetry/telemetry.ts
Normal file
@@ -0,0 +1,361 @@
|
||||
import { context as otlpContext, trace, propagation } from '@opentelemetry/api';
|
||||
import type { Tracer, Context, BaggageEntry } from '@opentelemetry/api';
|
||||
import type { OtelConfiguration } from './schemas.js';
|
||||
import { logger } from '../logger/logger.js';
|
||||
import { TelemetryError } from './errors.js';
|
||||
import { DextoRuntimeError } from '../errors/DextoRuntimeError.js';
|
||||
|
||||
// Type definitions for dynamically imported modules
|
||||
type NodeSDKType = import('@opentelemetry/sdk-node').NodeSDK;
|
||||
type ConsoleSpanExporterType = import('@opentelemetry/sdk-trace-base').ConsoleSpanExporter;
|
||||
type OTLPHttpExporterType = import('@opentelemetry/exporter-trace-otlp-http').OTLPTraceExporter;
|
||||
type OTLPGrpcExporterType = import('@opentelemetry/exporter-trace-otlp-grpc').OTLPTraceExporter;
|
||||
|
||||
// Add type declaration for global namespace
|
||||
declare global {
|
||||
var __TELEMETRY__: Telemetry | undefined;
|
||||
}
|
||||
|
||||
/**
|
||||
* TODO (Telemetry): enhancements
|
||||
* - Implement sampling strategies (ratio-based, parent-based, always-on/off)
|
||||
* - Add custom span processors for filtering/enrichment
|
||||
* - Support context propagation across A2A (agent-to-agent) calls
|
||||
* - Add cost tracking per trace (token costs, API costs)
|
||||
* - Add static shutdownGlobal() method for agent switching
|
||||
* See feature-plans/telemetry.md for details
|
||||
*/
|
||||
export class Telemetry {
|
||||
public tracer: Tracer = trace.getTracer('dexto');
|
||||
name: string = 'dexto-service';
|
||||
private _isInitialized: boolean = false;
|
||||
private _sdk?: NodeSDKType | undefined;
|
||||
private static _initPromise?: Promise<Telemetry> | undefined;
|
||||
private static _signalHandlers?: { sigterm: () => void; sigint: () => void } | undefined;
|
||||
|
||||
private constructor(config: OtelConfiguration, enabled: boolean, sdk?: NodeSDKType) {
|
||||
const serviceName = config.serviceName ?? 'dexto-service';
|
||||
const tracerName = config.tracerName ?? serviceName;
|
||||
|
||||
this.name = serviceName;
|
||||
this.tracer = trace.getTracer(tracerName);
|
||||
if (sdk) {
|
||||
this._sdk = sdk;
|
||||
}
|
||||
this._isInitialized = enabled && !!sdk;
|
||||
}
|
||||
|
||||
private static async buildTraceExporter(
|
||||
config: OtelConfiguration | undefined
|
||||
): Promise<ConsoleSpanExporterType | OTLPHttpExporterType | OTLPGrpcExporterType> {
|
||||
const e = config?.export;
|
||||
if (!e || e.type === 'console') {
|
||||
const { ConsoleSpanExporter } = await import('@opentelemetry/sdk-trace-base');
|
||||
return new ConsoleSpanExporter();
|
||||
}
|
||||
if (e.type === 'otlp') {
|
||||
if (e.protocol === 'grpc') {
|
||||
let OTLPGrpcExporter: typeof import('@opentelemetry/exporter-trace-otlp-grpc').OTLPTraceExporter;
|
||||
try {
|
||||
const mod = await import('@opentelemetry/exporter-trace-otlp-grpc');
|
||||
OTLPGrpcExporter = mod.OTLPTraceExporter;
|
||||
} catch (err) {
|
||||
const error = err as NodeJS.ErrnoException;
|
||||
if (error.code === 'ERR_MODULE_NOT_FOUND') {
|
||||
throw TelemetryError.exporterDependencyNotInstalled(
|
||||
'grpc',
|
||||
'@opentelemetry/exporter-trace-otlp-grpc'
|
||||
);
|
||||
}
|
||||
throw err;
|
||||
}
|
||||
const options: { url?: string } = {};
|
||||
if (e.endpoint) {
|
||||
options.url = e.endpoint;
|
||||
}
|
||||
return new OTLPGrpcExporter(options);
|
||||
}
|
||||
// default to http when omitted
|
||||
let OTLPHttpExporter: typeof import('@opentelemetry/exporter-trace-otlp-http').OTLPTraceExporter;
|
||||
try {
|
||||
const mod = await import('@opentelemetry/exporter-trace-otlp-http');
|
||||
OTLPHttpExporter = mod.OTLPTraceExporter;
|
||||
} catch (err) {
|
||||
const error = err as NodeJS.ErrnoException;
|
||||
if (error.code === 'ERR_MODULE_NOT_FOUND') {
|
||||
throw TelemetryError.exporterDependencyNotInstalled(
|
||||
'http',
|
||||
'@opentelemetry/exporter-trace-otlp-http'
|
||||
);
|
||||
}
|
||||
throw err;
|
||||
}
|
||||
const options: { url?: string; headers?: Record<string, string> } = {};
|
||||
if (e.endpoint) {
|
||||
options.url = e.endpoint;
|
||||
}
|
||||
if (e.headers) {
|
||||
options.headers = e.headers;
|
||||
}
|
||||
return new OTLPHttpExporter(options);
|
||||
}
|
||||
// schema also allows 'custom' but YAML cannot provide a SpanExporter instance
|
||||
const { ConsoleSpanExporter } = await import('@opentelemetry/sdk-trace-base');
|
||||
return new ConsoleSpanExporter();
|
||||
}
|
||||
/**
|
||||
* Initialize telemetry with the given configuration
|
||||
* @param config - Optional telemetry configuration object
|
||||
* @param exporter - Optional custom span exporter (overrides config.export, useful for testing)
|
||||
* @returns Telemetry instance that can be used for tracing
|
||||
*/
|
||||
static async init(
|
||||
config: OtelConfiguration = {},
|
||||
exporter?: import('@opentelemetry/sdk-trace-base').SpanExporter
|
||||
): Promise<Telemetry> {
|
||||
try {
|
||||
// Return existing instance if already initialized
|
||||
if (globalThis.__TELEMETRY__) return globalThis.__TELEMETRY__;
|
||||
|
||||
// Return pending promise if initialization is in progress
|
||||
if (Telemetry._initPromise) return Telemetry._initPromise;
|
||||
|
||||
// Create and store initialization promise to prevent race conditions
|
||||
Telemetry._initPromise = (async () => {
|
||||
if (!globalThis.__TELEMETRY__) {
|
||||
// honor enabled=false: skip SDK registration
|
||||
const enabled = config.enabled !== false;
|
||||
|
||||
let sdk: NodeSDKType | undefined;
|
||||
if (enabled) {
|
||||
// Dynamic imports for optional OpenTelemetry dependencies
|
||||
let NodeSDK: typeof import('@opentelemetry/sdk-node').NodeSDK;
|
||||
let Resource: typeof import('@opentelemetry/resources').Resource;
|
||||
let HttpInstrumentation: typeof import('@opentelemetry/instrumentation-http').HttpInstrumentation;
|
||||
let UndiciInstrumentation: typeof import('@opentelemetry/instrumentation-undici').UndiciInstrumentation;
|
||||
let ATTR_SERVICE_NAME: string;
|
||||
|
||||
try {
|
||||
const sdkModule = await import('@opentelemetry/sdk-node');
|
||||
NodeSDK = sdkModule.NodeSDK;
|
||||
|
||||
const resourcesModule = await import('@opentelemetry/resources');
|
||||
Resource = resourcesModule.Resource;
|
||||
|
||||
// Import specific instrumentations instead of auto-instrumentations-node
|
||||
// This reduces install size by ~130MB while maintaining HTTP tracing for LLM API calls
|
||||
const httpInstModule = await import(
|
||||
'@opentelemetry/instrumentation-http'
|
||||
);
|
||||
HttpInstrumentation = httpInstModule.HttpInstrumentation;
|
||||
|
||||
const undiciInstModule = await import(
|
||||
'@opentelemetry/instrumentation-undici'
|
||||
);
|
||||
UndiciInstrumentation = undiciInstModule.UndiciInstrumentation;
|
||||
|
||||
const semanticModule = await import(
|
||||
'@opentelemetry/semantic-conventions'
|
||||
);
|
||||
ATTR_SERVICE_NAME = semanticModule.ATTR_SERVICE_NAME;
|
||||
} catch (importError) {
|
||||
const err = importError as NodeJS.ErrnoException;
|
||||
if (err.code === 'ERR_MODULE_NOT_FOUND') {
|
||||
throw TelemetryError.dependencyNotInstalled([
|
||||
'@opentelemetry/sdk-node',
|
||||
'@opentelemetry/instrumentation-http',
|
||||
'@opentelemetry/instrumentation-undici',
|
||||
'@opentelemetry/resources',
|
||||
'@opentelemetry/semantic-conventions',
|
||||
'@opentelemetry/sdk-trace-base',
|
||||
'@opentelemetry/exporter-trace-otlp-http',
|
||||
'@opentelemetry/exporter-trace-otlp-grpc',
|
||||
]);
|
||||
}
|
||||
throw importError;
|
||||
}
|
||||
|
||||
const resource = new Resource({
|
||||
[ATTR_SERVICE_NAME]: config.serviceName ?? 'dexto-service',
|
||||
});
|
||||
|
||||
// Use custom exporter if provided, otherwise build from config
|
||||
const spanExporter =
|
||||
exporter || (await Telemetry.buildTraceExporter(config));
|
||||
|
||||
// Dynamically import CompositeExporter to avoid loading OpenTelemetry at startup
|
||||
const { CompositeExporter } = await import('./exporters.js');
|
||||
const traceExporter =
|
||||
spanExporter instanceof CompositeExporter
|
||||
? spanExporter
|
||||
: new CompositeExporter([spanExporter]);
|
||||
|
||||
// Use specific instrumentations for HTTP tracing:
|
||||
// - HttpInstrumentation: traces http/https module calls
|
||||
// - UndiciInstrumentation: traces fetch() calls (Node.js 18+ uses undici internally)
|
||||
sdk = new NodeSDK({
|
||||
resource,
|
||||
traceExporter,
|
||||
instrumentations: [
|
||||
new HttpInstrumentation(),
|
||||
new UndiciInstrumentation(),
|
||||
],
|
||||
});
|
||||
|
||||
await sdk.start(); // registers the global provider → no ProxyTracer
|
||||
|
||||
// graceful shutdown (one-shot, avoid unhandled rejection)
|
||||
const sigterm = () => {
|
||||
void sdk?.shutdown();
|
||||
};
|
||||
const sigint = () => {
|
||||
void sdk?.shutdown();
|
||||
};
|
||||
process.once('SIGTERM', sigterm);
|
||||
process.once('SIGINT', sigint);
|
||||
Telemetry._signalHandlers = { sigterm, sigint };
|
||||
}
|
||||
|
||||
globalThis.__TELEMETRY__ = new Telemetry(config, enabled, sdk);
|
||||
}
|
||||
return globalThis.__TELEMETRY__!;
|
||||
})();
|
||||
|
||||
// Await the promise so failures are caught by outer try/catch
|
||||
// This ensures _initPromise is cleared on failure, allowing re-initialization
|
||||
return await Telemetry._initPromise;
|
||||
} catch (error) {
|
||||
// Clear init promise so subsequent calls can retry
|
||||
Telemetry._initPromise = undefined;
|
||||
// Re-throw typed errors as-is, wrap unknown errors
|
||||
if (error instanceof DextoRuntimeError) {
|
||||
throw error;
|
||||
}
|
||||
throw TelemetryError.initializationFailed(
|
||||
error instanceof Error ? error.message : String(error),
|
||||
error
|
||||
);
|
||||
}
|
||||
}
|
||||
|
||||
static getActiveSpan() {
|
||||
const span = trace.getActiveSpan();
|
||||
return span;
|
||||
}
|
||||
|
||||
/**
|
||||
* Get the global telemetry instance
|
||||
* @throws {DextoRuntimeError} If telemetry has not been initialized
|
||||
* @returns {Telemetry} The global telemetry instance
|
||||
*/
|
||||
static get(): Telemetry {
|
||||
if (!globalThis.__TELEMETRY__) {
|
||||
throw TelemetryError.notInitialized();
|
||||
}
|
||||
return globalThis.__TELEMETRY__;
|
||||
}
|
||||
|
||||
/**
|
||||
* Check if global telemetry instance exists
|
||||
* @returns True if telemetry has been initialized, false otherwise
|
||||
*/
|
||||
static hasGlobalInstance(): boolean {
|
||||
return globalThis.__TELEMETRY__ !== undefined;
|
||||
}
|
||||
|
||||
/**
|
||||
* Shutdown global telemetry instance
|
||||
* Used during agent switching to cleanly shutdown old agent's telemetry
|
||||
* before initializing new agent's telemetry with potentially different config
|
||||
* @returns Promise that resolves when shutdown is complete
|
||||
*/
|
||||
static async shutdownGlobal(): Promise<void> {
|
||||
if (globalThis.__TELEMETRY__) {
|
||||
await globalThis.__TELEMETRY__.shutdown();
|
||||
globalThis.__TELEMETRY__ = undefined;
|
||||
}
|
||||
// Also clear the init promise to allow re-initialization
|
||||
Telemetry._initPromise = undefined;
|
||||
}
|
||||
|
||||
/**
|
||||
* Checks if the Telemetry instance has been successfully initialized.
|
||||
* @returns True if the instance is initialized, false otherwise.
|
||||
*/
|
||||
public isInitialized(): boolean {
|
||||
return this._isInitialized;
|
||||
}
|
||||
|
||||
static setBaggage(baggage: Record<string, BaggageEntry>, ctx: Context = otlpContext.active()) {
|
||||
const currentBaggage = Object.fromEntries(
|
||||
propagation.getBaggage(ctx)?.getAllEntries() ?? []
|
||||
);
|
||||
const newCtx = propagation.setBaggage(
|
||||
ctx,
|
||||
propagation.createBaggage({
|
||||
...currentBaggage,
|
||||
...baggage,
|
||||
})
|
||||
);
|
||||
return newCtx;
|
||||
}
|
||||
|
||||
static withContext(ctx: Context, fn: () => void) {
|
||||
return otlpContext.with(ctx, fn);
|
||||
}
|
||||
|
||||
/**
|
||||
* Forces pending spans to be exported immediately.
|
||||
* Useful for testing to ensure spans are available in exporters.
|
||||
*/
|
||||
public async forceFlush(): Promise<void> {
|
||||
if (this._isInitialized) {
|
||||
// Access the global tracer provider and force flush
|
||||
const provider = trace.getTracerProvider() as any;
|
||||
if (provider && typeof provider.forceFlush === 'function') {
|
||||
await provider.forceFlush();
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Shuts down the OpenTelemetry SDK, flushing any pending spans.
|
||||
* This should be called before the application exits.
|
||||
*
|
||||
* Uses two-phase shutdown:
|
||||
* 1. Best-effort flush - Try to export pending spans (can fail if backend unavailable)
|
||||
* 2. Force cleanup - Always clear global state to allow re-initialization
|
||||
*
|
||||
* This ensures agent switching works even when telemetry export fails.
|
||||
*/
|
||||
public async shutdown(): Promise<void> {
|
||||
if (this._sdk) {
|
||||
try {
|
||||
// Phase 1: Best-effort flush pending spans to backend
|
||||
// This can fail if Jaeger/OTLP collector is unreachable
|
||||
await this._sdk.shutdown();
|
||||
} catch (error) {
|
||||
// Don't throw - log warning and continue with cleanup
|
||||
// Telemetry is observability infrastructure, not core functionality
|
||||
const errorMsg = error instanceof Error ? error.message : String(error);
|
||||
logger.warn(`Telemetry shutdown failed to flush spans (non-blocking): ${errorMsg}`);
|
||||
} finally {
|
||||
// Phase 2: Force cleanup - MUST always happen regardless of flush success
|
||||
// This ensures we can reinitialize telemetry for agent switching
|
||||
this._isInitialized = false;
|
||||
globalThis.__TELEMETRY__ = undefined; // Clear the global instance
|
||||
|
||||
// Cleanup signal handlers to prevent leaks
|
||||
if (Telemetry._signalHandlers) {
|
||||
process.off('SIGTERM', Telemetry._signalHandlers.sigterm);
|
||||
process.off('SIGINT', Telemetry._signalHandlers.sigint);
|
||||
Telemetry._signalHandlers = undefined;
|
||||
}
|
||||
|
||||
// Clear references for GC and re-initialization
|
||||
this._sdk = undefined;
|
||||
Telemetry._initPromise = undefined;
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
22
dexto/packages/core/src/telemetry/types.ts
Normal file
22
dexto/packages/core/src/telemetry/types.ts
Normal file
@@ -0,0 +1,22 @@
|
||||
import type { ReadableSpan } from '@opentelemetry/sdk-trace-base';
|
||||
|
||||
/**
|
||||
* Trace data structure for storage/retrieval
|
||||
* Used by telemetry storage exporters for persisting trace data
|
||||
*/
|
||||
export type Trace = {
|
||||
id: string;
|
||||
parentSpanId: string;
|
||||
name: string;
|
||||
traceId: string;
|
||||
scope: string;
|
||||
kind: ReadableSpan['kind'];
|
||||
attributes: ReadableSpan['attributes'];
|
||||
status: ReadableSpan['status'];
|
||||
events: ReadableSpan['events'];
|
||||
links: ReadableSpan['links'];
|
||||
other: Record<string, any>;
|
||||
startTime: number;
|
||||
endTime: number;
|
||||
createdAt: string;
|
||||
};
|
||||
82
dexto/packages/core/src/telemetry/utils.ts
Normal file
82
dexto/packages/core/src/telemetry/utils.ts
Normal file
@@ -0,0 +1,82 @@
|
||||
import { propagation } from '@opentelemetry/api';
|
||||
import type { Context, Span } from '@opentelemetry/api';
|
||||
import { Telemetry } from './telemetry.js';
|
||||
import type { IDextoLogger } from '../logger/v2/types.js';
|
||||
|
||||
// Helper function to check if telemetry is active
|
||||
export function hasActiveTelemetry(logger?: IDextoLogger): boolean {
|
||||
logger?.silly('hasActiveTelemetry called.');
|
||||
try {
|
||||
const telemetryInstance = Telemetry.get();
|
||||
const isActive = telemetryInstance.isInitialized();
|
||||
logger?.silly(`hasActiveTelemetry: Telemetry is initialized: ${isActive}`);
|
||||
return isActive;
|
||||
} catch (error) {
|
||||
logger?.silly(
|
||||
`hasActiveTelemetry: Telemetry not active or initialized. Error: ${error instanceof Error ? error.message : String(error)}`
|
||||
);
|
||||
return false;
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Get baggage values from context
|
||||
* @param ctx The context to get baggage values from
|
||||
* @param logger Optional logger instance
|
||||
* @returns
|
||||
*/
|
||||
export function getBaggageValues(ctx: Context, logger?: IDextoLogger) {
|
||||
logger?.silly('getBaggageValues called.');
|
||||
const currentBaggage = propagation.getBaggage(ctx);
|
||||
const requestId = currentBaggage?.getEntry('http.request_id')?.value;
|
||||
const componentName = currentBaggage?.getEntry('componentName')?.value;
|
||||
const runId = currentBaggage?.getEntry('runId')?.value;
|
||||
const threadId = currentBaggage?.getEntry('threadId')?.value;
|
||||
const resourceId = currentBaggage?.getEntry('resourceId')?.value;
|
||||
const sessionId = currentBaggage?.getEntry('sessionId')?.value;
|
||||
logger?.silly(
|
||||
`getBaggageValues: Extracted - requestId: ${requestId}, componentName: ${componentName}, runId: ${runId}, threadId: ${threadId}, resourceId: ${resourceId}, sessionId: ${sessionId}`
|
||||
);
|
||||
return {
|
||||
requestId,
|
||||
componentName,
|
||||
runId,
|
||||
threadId,
|
||||
resourceId,
|
||||
sessionId,
|
||||
};
|
||||
}
|
||||
|
||||
/**
|
||||
* Attaches baggage values from the given context to the provided span as attributes.
|
||||
* @param span The OpenTelemetry Span to add attributes to.
|
||||
* @param ctx The OpenTelemetry Context from which to extract baggage values.
|
||||
* @param logger Optional logger instance
|
||||
*/
|
||||
export function addBaggageAttributesToSpan(span: Span, ctx: Context, logger?: IDextoLogger): void {
|
||||
logger?.debug('addBaggageAttributesToSpan called.');
|
||||
const { requestId, componentName, runId, threadId, resourceId, sessionId } = getBaggageValues(
|
||||
ctx,
|
||||
logger
|
||||
);
|
||||
|
||||
if (componentName) {
|
||||
span.setAttribute('componentName', componentName);
|
||||
}
|
||||
if (runId) {
|
||||
span.setAttribute('runId', runId);
|
||||
}
|
||||
if (requestId) {
|
||||
span.setAttribute('http.request_id', requestId);
|
||||
}
|
||||
if (threadId) {
|
||||
span.setAttribute('threadId', threadId);
|
||||
}
|
||||
if (resourceId) {
|
||||
span.setAttribute('resourceId', resourceId);
|
||||
}
|
||||
if (sessionId) {
|
||||
span.setAttribute('sessionId', sessionId);
|
||||
}
|
||||
logger?.debug('addBaggageAttributesToSpan: Baggage attributes added to span.');
|
||||
}
|
||||
Reference in New Issue
Block a user