Create Monitor
Create a monitor using the specified options.
Monitor Types
The type of monitor chosen from:
- anomaly:
query alert
- APM:
query alert
ortrace-analytics alert
- composite:
composite
- custom:
service check
- event:
event alert
- forecast:
query alert
- host:
service check
- integration:
query alert
orservice check
- live process:
process alert
- logs:
log alert
- metric:
metric alert
- network:
service check
- outlier:
query alert
- process:
service check
- rum:
rum alert
- SLO:
slo alert
- watchdog:
event alert
- event-v2:
event-v2 alert
Query Types
Metric Alert Query
Example: time_aggr(time_window):space_aggr:metric{tags} [by {key}] operator #
time_aggr
: avg, sum, max, min, change, or pct_changetime_window
:last_#m
(with#
between 1 and 2880 depending on the monitor type) orlast_#h
(with#
between 1 and 48 depending on the monitor type), orlast_1d
space_aggr
: avg, sum, min, or maxtags
: one or more tags (comma-separated), or *key
: a 'key' in key:value tag syntax; defines a separate alert for each tag in the group (multi-alert)operator
: <, <=, >, >=, ==, or !=#
: an integer or decimal number used to set the threshold
If you are using the _change_
or _pct_change_
time aggregator, instead use change_aggr(time_aggr(time_window),
timeshift):space_aggr:metric{tags} [by {key}] operator #
with:
change_aggr
change, pct_changetime_aggr
avg, sum, max, min Learn moretime_window
last_#m (between 1 and 2880 depending on the monitor type), last_#h (between 1 and 48 depending on the monitor type), or last_#d (1 or 2)timeshift
#m_ago (5, 10, 15, or 30), #h_ago (1, 2, or 4), or 1d_ago
Use this to create an outlier monitor using the following query:
avg(last_30m):outliers(avg:system.cpu.user{role:es-events-data} by {host}, 'dbscan', 7) > 0
Service Check Query
Example: "check".over(tags).last(count).by(group).count_by_status()
check
name of the check, e.g.datadog.agent.up
tags
one or more quoted tags (comma-separated), or "*". e.g.:.over("env:prod", "role:db")
;over
cannot be blank.count
must be at greater than or equal to your max threshold (defined in theoptions
). It is limited to 100. For example, if you've specified to notify on 1 critical, 3 ok, and 2 warn statuses,count
should be at least 3.group
must be specified for check monitors. Per-check grouping is already explicitly known for some service checks. For example, Postgres integration monitors are tagged bydb
,host
, andport
, and Network monitors byhost
,instance
, andurl
. See Service Checks documentation for more information.
Event Alert Query
Example: events('sources:nagios status:error,warning priority:normal tags: "string query"').rollup("count").last("1h")"
event
, the event query string:string_query
free text query to match against event title and text.sources
event sources (comma-separated).status
event statuses (comma-separated). Valid options: error, warn, and info.priority
event priorities (comma-separated). Valid options: low, normal, all.host
event reporting host (comma-separated).tags
event tags (comma-separated).excluded_tags
excluded event tags (comma-separated).rollup
the stats roll-up method.count
is the only supported method now.last
the timeframe to roll up the counts. Examples: 45m, 4h. Supported timeframes: m, h and d. This value should not exceed 48 hours.
NOTE Only available on US1 and EU.
Event V2 Alert Query
Example: events(query).rollup(rollup_method[, measure]).last(time_window) operator #
query
The search query - following the Log search syntax.rollup_method
The stats roll-up method - supportscount
,avg
andcardinality
.measure
Foravg
and cardinalityrollup_method
- specify the measure or the facet name you want to use.time_window
#m (between 1 and 2880), #h (between 1 and 48).operator
<
,<=
,>
,>=
,==
, or!=
.#
an integer or decimal number used to set the threshold.
NOTE Only available on US1-FED, US3, and in closed beta on EU and US1.
Process Alert Query
Example: processes(search).over(tags).rollup('count').last(timeframe) operator #
search
free text search string for querying processes. Matching processes match results on the Live Processes page.tags
one or more tags (comma-separated)timeframe
the timeframe to roll up the counts. Examples: 10m, 4h. Supported timeframes: s, m, h and doperator
<, <=, >, >=, ==, or !=#
an integer or decimal number used to set the threshold
Logs Alert Query
Example: logs(query).index(index_name).rollup(rollup_method[, measure]).last(time_window) operator #
query
The search query - following the Log search syntax.index_name
For multi-index organizations, the log index in which the request is performed.rollup_method
The stats roll-up method - supportscount
,avg
andcardinality
.measure
Foravg
and cardinalityrollup_method
- specify the measure or the facet name you want to use.time_window
#m (between 1 and 2880), #h (between 1 and 48).operator
<
,<=
,>
,>=
,==
, or!=
.#
an integer or decimal number used to set the threshold.
Composite Query
Example: 12345 && 67890
, where 12345
and 67890
are the IDs of non-composite monitors
name
[required, default = dynamic, based on query]: The name of the alert.message
[required, default = dynamic, based on query]: A message to include with notifications for this monitor. Email notifications can be sent to specific users by using the same '@username' notation as events.tags
[optional, default = empty list]: A list of tags to associate with your monitor. When getting all monitor details via the API, use themonitor_tags
argument to filter results by these tags. It is only available via the API and isn't visible or editable in the Datadog UI.
SLO Alert Query
Example: error_budget("slo_id").over("time_window") operator #
slo_id
: The alphanumeric SLO ID of the SLO you are configuring the alert for.time_window
: The time window of the SLO target you wish to alert on. Valid options:7d
,30d
,90d
.operator
:>=
or>
.
To learn more, visit the Datadog documentation.
Basic Parameters
Parameter | Description |
---|---|
Message | A message to include with notifications for this monitor. |
Name | The monitor name. |
Query | The monitor query. |
Type | The type of the monitor. |
Advanced Parameters
Parameter | Description |
---|---|
Enable Logs Sample | Whether or not to send a log sample when the log monitor triggers. |
Priority | Integer from 1 (high) to 5 (low) indicating alert severity. |
Restricted Roles | A list of role identifiers that can be pulled from the Roles API. Cannot be used with locked option. |
Tags | Tags associated to your monitor. |
Example Output
{
"created": "2000-01-23T04:56:07.000+00:00",
"creator": {
"email": "email",
"handle": "handle",
"name": "name"
},
"deleted": "2000-01-23T04:56:07.000+00:00",
"id": 0,
"message": "message",
"modified": "2000-01-23T04:56:07.000+00:00",
"multi": true,
"name": "name",
"options": {
"aggregation": {
"group_by": "host",
"metric": "metrics.name",
"type": "count"
},
"device_ids": [
null,
null
],
"enable_logs_sample": true,
"escalation_message": "none",
"evaluation_delay": 6,
"groupby_simple_monitor": true,
"include_tags": true,
"locked": true,
"min_failure_duration": 1055,
"min_location_failed": 5,
"new_host_delay": 5,
"no_data_timeframe": 2,
"notify_audit": false,
"notify_no_data": false,
"renotify_interval": 7,
"require_full_window": true,
"silenced": {
"key": 9
},
"synthetics_check_id": "synthetics_check_id",
"threshold_windows": {
"recovery_window": "recovery_window",
"trigger_window": "trigger_window"
},
"thresholds": {
"critical": 3.616076749251911,
"critical_recovery": 2.027123023002322,
"ok": 4.145608029883936,
"unknown": 7.386281948385884,
"warning": 1.2315135367772556,
"warning_recovery": 1.0246457001441578
},
"timeout_h": 1
},
"priority": 3,
"query": "avg(last_5m):sum:system.net.bytes_rcvd{host:host0} \u003e 100",
"restricted_roles": [
"restricted_roles",
"restricted_roles"
],
"state": {
"groups": {
"key": {
"last_nodata_ts": 7,
"last_notified_ts": 1,
"last_resolved_ts": 4,
"last_triggered_ts": 5,
"name": "name"
}
}
},
"tags": [
"tags",
"tags"
],
"type": "metric alert"
}
Workflow Library Example
Create Datadog Monitor for Kubernetes Namespace