Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
129 changes: 129 additions & 0 deletions core/src/main/java/io/temporal/samples/temporalmetricsdemo/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,129 @@
# Temporal Cloud OpenMetrics → Prometheus → Grafana (Step-by-step)

This demo shows how to **scrape Temporal Cloud OpenMetrics(https://docs.temporal.io/cloud/metrics/openmetrics/)**
with **Prometheus** and **visualize them in Grafana**.

It uses the Grafana Temporal mixin dashboard template:
https://github.com/grafana/jsonnet-libs/blob/master/temporal-mixin/dashboards/temporal-overview.json

Once imported/provisioned, the dashboard lets you view the key Temporal metrics in a ready-made layout.
it may take a second to load, refresh if it takes longer than that.

![Grafana dashboard 4](docs/images/img4.png)

Grafana dashboard view :-

**temporal cloud openmetrics**
![Grafana dashboard 1](docs/images/img1.png)
![Grafana dashboard 2](docs/images/img2.png)


**worker metrics**
![Grafana dashboard 3](docs/images/img6.png)
![Grafana dashboard 4](docs/images/img7.png)

Prometheus :-

**cloud metrics**
![Prometheus Cloud metrics](docs/images/img3.png)
**worker metrics**
![Prometheus Worker metrics](docs/images/img3.png)

---

## 1) Create Service Account + API Key (Temporal Cloud)

1. Create a service account with **Metrics Read-Only** role
OpenMetrics auth reference:
https://docs.temporal.io/production-deployment/cloud/metrics/openmetrics/api-reference#authentication

In Temporal Cloud UI:
- **Settings → Service Accounts**
- Create a service account with **Metrics Read-Only** role
- Generate an **API key** ( copy this, it will be needed later)
---
2. Add your namespace and assign namespace permission

![metrics read only service account](docs/images/img8.png)


** This approach is only for testing, for production either have 2 service account 1 for running workflows and 1 for metrics reading
to have more control on API keys **


## 2) Update Prometheus scrape config

prometheus/config.yml
Update it to use your namespace
```
params:
namespaces: [ '<namespace>.<account-id>' ]
```


## 3) Start Prometheus + Grafana

docker compose up -d
docker compose ps

## 4) Ran the sample and view the cloud metrics

Terminal 1
**export below env variables in the respective terminal for running WorkerMain**

export TEMPORAL_API_KEY=<api_created_above>
export TEMPORAL_NAMESPACE="<namespace>.<account-id>"
export TEMPORAL_ADDRESS="<region>.<cloud-provider>.api.temporal.io:7233" (regional endpoint for API auth)

- `./gradlew -q execute -PmainClass=io.temporal.samples.temporalmetricsdemo.WorkerMain`

Terminal 2

**export env variables again for running Starter**

- `METRICS_PORT=9465 ./gradlew -q execute -PmainClass=io.temporal.samples.temporalmetricsdemo.Starter`

starter logs
```
➜ samples-java git:(deepika/openmetrics-demo) ✗ METRICS_PORT=9465 ./gradlew -q execute -PmainClass=io.temporal.samples.temporalmetricsdemo.Starter
Worker metrics exposed at http://0.0.0.0:9465/metrics
13:58:44.102 { } [main] INFO i.t.s.WorkflowServiceStubsImpl - Created WorkflowServiceStubs for channel: ManagedChannelOrphanWrapper{delegate=ManagedChannelImpl{logId=1, target=deepika-test-namespace.a2dd6.tmprl.cloud:7233}}

=== Starting scenario: success workflowId=scenario-success-6120c15f-b2f0-40cb-b3a6-f39bf0af6698 ===
Scenario=success Result=Hello Temporal

=== Starting scenario: fail workflowId=scenario-fail-0ff499af-1c3c-4a78-b75d-7d24c66ef46d ===
Scenario=fail ended: WorkflowFailedException - Workflow execution {workflowId='scenario-fail-0ff499af-1c3c-4a78-b75d-7d24c66ef46d', runId='', workflowType='ScenarioWorkflow'} failed. Metadata: {closeEventType='EVENT_TYPE_WORKFLOW_EXECUTION_FAILED', retryState='RETRY_STATE_RETRY_POLICY_NOT_SET', workflowTaskCompletedEventId=10'}

=== Starting scenario: timeout workflowId=scenario-timeout-e63954fb-c39b-4fb4-9dc9-ee2fa2835b52 ===
Scenario=timeout ended: WorkflowFailedException - Workflow execution {workflowId='scenario-timeout-e63954fb-c39b-4fb4-9dc9-ee2fa2835b52', runId='', workflowType='ScenarioWorkflow'} timed out. Metadata: {closeEventType='EVENT_TYPE_WORKFLOW_EXECUTION_TIMED_OUT', retryState='RETRY_STATE_RETRY_POLICY_NOT_SET'}

=== Starting scenario: continue workflowId=scenario-continue-26e780ab-1f0a-4222-9b2b-5c09c62cb824 ===
Scenario=continue Result=Hello Temporal

=== Starting scenario: cancel workflowId=scenario-cancel-08c9d1f6-6bfb-4ee9-9a5d-23b35fe1af7c ===
Scenario=cancel ended: WorkflowFailedException - Workflow execution {workflowId='scenario-cancel-08c9d1f6-6bfb-4ee9-9a5d-23b35fe1af7c', runId=''} was cancelled. Metadata: {closeEventType='EVENT_TYPE_WORKFLOW_EXECUTION_CANCELED', retryState='RETRY_STATE_NON_RETRYABLE_FAILURE'}
<===========--> 87% EXECUTING [18m 6s]

```
there will be some failures in the worker logs as we are intentionally failing workflows for the data generation purpose.
give few seconds to see the data in both the dashboard and try to run couple of workflows so the rate
queries show properly.

## 5) View Grafana dashboard

http://localhost:3001/

- Username: admin
- Password: admin

You should see the Temporal Cloud OpenMetrics dashboard.

## 6) Verify metrics in Prometheus

Prometheus: http://localhost:9093/

Go to:
Status → Targets (make sure the scrape target is UP)
Graph tab (search for Temporal metrics and run a query)

Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
package io.temporal.samples.temporalmetricsdemo;

import io.temporal.client.WorkflowClient;
import io.temporal.client.WorkflowOptions;
import io.temporal.client.WorkflowStub;
import io.temporal.samples.temporalmetricsdemo.workflows.ScenarioWorkflow;
import java.time.Duration;
import java.util.UUID;

public class Starter {

public static void main(String[] args) throws Exception {
WorkflowClient client = TemporalConnection.client();

String name = "Temporal";
String[] scenarios = {"success", "fail", "timeout", "continue", "cancel"};

for (String scenario : scenarios) {
String wid = "scenario-" + scenario + "-" + UUID.randomUUID();

WorkflowOptions.Builder optionsBuilder =
WorkflowOptions.newBuilder()
.setTaskQueue(TemporalConnection.TASK_QUEUE)
.setWorkflowId(wid);

// workflow timeout
if ("timeout".equalsIgnoreCase(scenario)) {
optionsBuilder.setWorkflowRunTimeout(Duration.ofSeconds(3));
}

ScenarioWorkflow wf = client.newWorkflowStub(ScenarioWorkflow.class, optionsBuilder.build());

System.out.println("\n=== Starting scenario: " + scenario + " workflowId=" + wid + " ===");

try {
if ("cancel".equalsIgnoreCase(scenario)) {
WorkflowClient.start(wf::run, scenario, name);
Thread.sleep(2000);
WorkflowStub untyped = client.newUntypedWorkflowStub(wid);
untyped.cancel();
untyped.getResult(String.class);
continue;
}

// normal synchronous execution
String result = wf.run(scenario, name);
System.out.println("Scenario=" + scenario + " Result=" + result);

} catch (Exception e) {
System.out.println(
"Scenario="
+ scenario
+ " ended: "
+ e.getClass().getSimpleName()
+ " - "
+ e.getMessage());
}
}
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,136 @@
package io.temporal.samples.temporalmetricsdemo;

import com.sun.net.httpserver.HttpServer;
import com.uber.m3.tally.RootScopeBuilder;
import com.uber.m3.tally.Scope;
import com.uber.m3.tally.StatsReporter;
import com.uber.m3.util.Duration;
import io.micrometer.prometheus.PrometheusConfig;
import io.micrometer.prometheus.PrometheusMeterRegistry;
import io.temporal.client.WorkflowClient;
import io.temporal.client.WorkflowClientOptions;
import io.temporal.common.reporter.MicrometerClientStatsReporter;
import io.temporal.serviceclient.WorkflowServiceStubs;
import io.temporal.serviceclient.WorkflowServiceStubsOptions;
import java.io.OutputStream;
import java.net.InetSocketAddress;
import java.nio.charset.StandardCharsets;

public final class TemporalConnection {
private TemporalConnection() {}

// Required: MUST be set in env. No defaults.
public static final String NAMESPACE = envRequired("TEMPORAL_NAMESPACE");
public static final String ADDRESS = envRequired("TEMPORAL_ADDRESS");
public static final String TASK_QUEUE = env("TASK_QUEUE", "openmetrics-task-queue");

private static final int METRICS_PORT = envInt("METRICS_PORT", 9464);
private static final int METRICS_REPORT_SECONDS = envInt("METRICS_REPORT_SECONDS", 10);

private static volatile WorkflowClient CLIENT;
private static volatile PrometheusMeterRegistry PROM;
private static volatile boolean METRICS_STARTED;

public static WorkflowClient client() {
if (CLIENT != null) return CLIENT;
synchronized (TemporalConnection.class) {
if (CLIENT != null) return CLIENT;

String apiKey = envRequired("TEMPORAL_API_KEY");

// Validation
validate();
System.out.println("TemporalConnection: ADDRESS=" + ADDRESS);
System.out.println("TemporalConnection: NAMESPACE=" + NAMESPACE);

Scope scope = metricsScope();

WorkflowServiceStubs service =
WorkflowServiceStubs.newServiceStubs(
WorkflowServiceStubsOptions.newBuilder()
.setTarget(ADDRESS)
.setEnableHttps(true)
.addApiKey(() -> apiKey)
.setMetricsScope(scope)
.build());

CLIENT =
WorkflowClient.newInstance(
service, WorkflowClientOptions.newBuilder().setNamespace(NAMESPACE).build());

return CLIENT;
}
}

private static void validate() {
if (NAMESPACE.isBlank()) {
throw new IllegalStateException("TEMPORAL_NAMESPACE must be set (non-blank).");
}
if (ADDRESS.isBlank()) {
throw new IllegalStateException("TEMPORAL_ADDRESS must be set (non-blank).");
}
}

private static Scope metricsScope() {
synchronized (TemporalConnection.class) {
if (PROM == null) PROM = new PrometheusMeterRegistry(PrometheusConfig.DEFAULT);

StatsReporter reporter = new MicrometerClientStatsReporter(PROM);
Scope scope =
new RootScopeBuilder()
.reporter(reporter)
.reportEvery(Duration.ofSeconds(METRICS_REPORT_SECONDS));

if (!METRICS_STARTED) {
METRICS_STARTED = true;
startMetricsHttpServer(PROM);
}
return scope;
}
}

private static void startMetricsHttpServer(PrometheusMeterRegistry registry) {
try {
HttpServer server = HttpServer.create(new InetSocketAddress("0.0.0.0", METRICS_PORT), 0);
server.createContext(
"/metrics",
exchange -> {
byte[] body = registry.scrape().getBytes(StandardCharsets.UTF_8);
exchange
.getResponseHeaders()
.add("Content-Type", "text/plain; version=0.0.4; charset=utf-8");
exchange.sendResponseHeaders(200, body.length);
try (OutputStream os = exchange.getResponseBody()) {
os.write(body);
}
});
server.start();
System.out.println("Worker metrics at http://0.0.0.0:" + METRICS_PORT + "/metrics");
} catch (Exception e) {
throw new RuntimeException("Failed to start /metrics endpoint", e);
}
}

private static String env(String key, String def) {
String v = System.getenv(key);
return (v == null || v.isBlank()) ? def : v.trim();
}

private static String envRequired(String key) {
String v = System.getenv(key);
if (v == null || v.isBlank()) {
throw new IllegalStateException("Missing required env var: " + key);
}
return v.trim();
}

private static int envInt(String key, int def) {
String v = System.getenv(key);
if (v == null || v.isBlank()) return def;
try {
return Integer.parseInt(v.trim());
} catch (Exception e) {
return def;
}
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
package io.temporal.samples.temporalmetricsdemo;

import io.temporal.client.WorkflowClient;
import io.temporal.samples.temporalmetricsdemo.activities.ScenarioActivitiesImpl;
import io.temporal.samples.temporalmetricsdemo.workflows.ScenarioWorkflowImpl;
import io.temporal.worker.WorkerFactory;

public class WorkerMain {
public static void main(String[] args) throws Exception {
WorkflowClient client = TemporalConnection.client();

WorkerFactory factory = WorkerFactory.newInstance(client);
io.temporal.worker.Worker worker = factory.newWorker(TemporalConnection.TASK_QUEUE);

worker.registerWorkflowImplementationTypes(ScenarioWorkflowImpl.class);
worker.registerActivitiesImplementations(new ScenarioActivitiesImpl());

factory.start();
System.out.println(
"Worker started. namespace="
+ TemporalConnection.NAMESPACE
+ " taskQueue="
+ TemporalConnection.TASK_QUEUE
+ " metrics=http://0.0.0.0:"
+ System.getenv().getOrDefault("METRICS_PORT", "9464")
+ "/metrics");

Thread.currentThread().join();
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
package io.temporal.samples.temporalmetricsdemo.activities;

import io.temporal.activity.ActivityInterface;
import io.temporal.activity.ActivityMethod;

@ActivityInterface
public interface ScenarioActivities {
@ActivityMethod
String doWork(String name, String scenario);
}
Loading
Loading