Tools/content-moderation/moderate

content-moderation/moderate

Classify text and/or images for unsafe content; returns per-category flags and scores. Pass input as a string for text, or as an array of {type:text,...} / {type:image_url,...} objects to moderate images.

0.1 credits / call ($0.0001)

charged on success

What it does

Screens content for policy-violating material across categories including sexual content, hate, harassment, self-harm, violence, and illicit behavior, returning a boolean flag and a 0-1 confidence score for each category plus an overall flagged verdict.

Primary use cases

Pre-screening user-generated text or images before they are published, stored, or shown to other users
Gatekeeping an AI agent's own generated output before it is returned to an end user
Triaging a moderation queue by ranking items on category confidence scores
Enforcing app-specific policies by thresholding on individual category scores

Why use this tool

It is a fast, multimodal safety classifier covering many harm categories in a single call, with calibrated per-category scores so you can set your own thresholds rather than relying on a single yes/no. Because it returns a verdict rather than blocking, you stay in control of the policy decision.

Good to know

Pass input as a plain string for text-only checks, or as an array of typed parts ({"type":"text","text":...} and {"type":"image_url","image_url":{"url":...}}) to include images. The default model is multimodal; this tool surfaces the verdict and does not block on your behalf.

Parameters

inputrequired

Text to classify (a string), or an array of typed parts for multimodal input, e.g. [{"type":"text","text":"..."},{"type":"image_url","image_url":{"url":"https://..."}}].

modelstringoptionaldefault: "omni-moderation-latest"

Moderation model. omni-moderation-latest is multimodal (text + images); text-moderation-latest is text-only.