PurpleLlama (by FaceBook)
This is an umbrella project that over time will bring together tools and evals to help the community build responsibly with open generative AI models. The initial release will include tools and evals for Cyber Security and Input/Output safeguards but we plan to contribute more in the near future.
Lakera Guard
It empowers organizations to build GenAI applications without worrying about prompt injections, data loss, harmful content, and other LLM risks.
Rebuff
It is designed to protect AI applications from prompt injection (PI) attacks through a multi-layered defense (Heuristics, LLM-based detection, VectorDB, Canary tokens).
Garak
It checks if an LLM can be made to fail in an way we don't want. garak probes for hallucination, data leakage, prompt injection, misinformation, toxicity generation, jailbreaks, and many other weaknesses. If you know nmap, it's nmap for LLMs.
LLM-Guard
It ensures that your interactions with LLMs remain safe and secure, by offering sanitization, detection of harmful language, prevention of data leakage, and resistance against prompt injection attacks.
Vigil-LLM
This is a Python library and REST API for assessing Large Language Model prompts and responses against a set of scanners to detect prompt injections, jailbreaks, and other potential threats.
Plexiglass
This is a toolkit for detecting and protecting against vulnerabilities in Large Language Models (LLMs). It is a simple command line interface (CLI) tool which allows users to quickly test LLMs against adversarial attacks such as prompt injection, jailbreaking and more. Plexiglass also allows security, bias and toxicity benchmarking of multiple LLMs by scraping latest adversarial prompts such as jailbreakchat.com and wiki_toxic.
NeMo (by Nvidia)
This is an open-source toolkit for easily adding programmable guardrails to LLM-based conversational applications. Guardrails (or "rails" for short) are specific ways of controlling the output of a large language model, such as not talking about politics, responding in a particular way to specific user requests, following a predefined dialog path, using a particular language style, extracting structured data, and more.