Classify text and/or images for unsafe content; returns per-category flags and scores. Pass input as a string for text, or as an array of {type:text,...} / {type:image_url,...} objects to moderate images.
Screens content for policy-violating material across categories including sexual content, hate, harassment, self-harm, violence, and illicit behavior, returning a boolean flag and a 0-1 confidence score for each category plus an overall flagged verdict.
It is a fast, multimodal safety classifier covering many harm categories in a single call, with calibrated per-category scores so you can set your own thresholds rather than relying on a single yes/no. Because it returns a verdict rather than blocking, you stay in control of the policy decision.
Pass input as a plain string for text-only checks, or as an array of typed parts ({"type":"text","text":...} and {"type":"image_url","image_url":{"url":...}}) to include images. The default model is multimodal; this tool surfaces the verdict and does not block on your behalf.
Text to classify (a string), or an array of typed parts for multimodal input, e.g. [{"type":"text","text":"..."},{"type":"image_url","image_url":{"url":"https://..."}}].
Moderation model. omni-moderation-latest is multimodal (text + images); text-moderation-latest is text-only.