// Components
import React, {useEffect} from "react";
import DocumentationNav from "./DocumentationNav";
import BaseButton from "../../components/base/Button";
import RiskLabel from "../../components/labels/RiskLabel";
import InjectionOverview from "./injections/InjectionOverview";
import AdversarialPromptingLabel from "../../components/labels/AdversarialPromptingLabel";

function AdversarialPromptingDocumentation() {
  useEffect(() => {
    document.title = "Crafting AI Prompts Framework - Adversarial Prompting"
  }, []);

  return (
      <>
      <section className="w-full pb-24 flex dark:bg-slate-900 dark:text-white" id="adversarial-prompting">
        <div className="relative max-w-screen-xl px-4 sm:px-8 mx-auto grid grid-cols-12 gap-x-6">
              <DocumentationNav page="AdversarialPrompting" />

              <div className="col-span-12 lg:col-span-9 space-y-6 px-4 sm:px-6 mt-20 flex-1 dark:text-white content-section" id="adversarial-prompting-exp"
                   data-aos="fade-up"
                   data-aos-delay="100">
                <h1 className="text-bold"><span className="text-header-gradient">Adversarial</span> Prompting</h1>
                <p>Adversarial prompting encompasses a range of techniques, including prompt injection, prompt leaking, and prompt jailbreaking, designed to exploit vulnerabilities in natural language processing (NLP) or language model systems. These attacks manipulate the input prompts to elicit harmful, unintended, or sensitive outputs from the models. By leveraging the models' text generation capabilities, adversarial prompting can lead to security breaches, privacy violations, and the generation of inappropriate or dangerous content, highlighting the need for robust safeguards and monitoring mechanisms in the deployment of NLP systems.</p>
                <p>To mitigate adversarial prompting attacks, it's crucial to implement security measures and validation checks when accepting user-generated prompts or input. These measures may include content moderation, filtering, and anomaly detection to identify and block malicious or inappropriate prompts. Additionally, continuous monitoring and updates to language models can help reduce vulnerabilities to prompt injection attacks.</p>

                <div className={"block flex col-2 mb-1"} id="prompt-injections">
                  <h2 className="text-bold mb-2">Prompt <span className="text-header-gradient">Injections</span></h2>
                  <div className={"mt-2 ml-4 pt-1"}>
                    <AdversarialPromptingLabel category={"PI"} />
                  </div>
                </div>

                <p>A prompt injection attack is a type of security exploit that involves subtly manipulating or injecting malicious prompts into a natural language processing (NLP) or language model system. This manipulation can occur in ways that the user might overlook, leading the model to generate harmful or unintended output. The attack leverages the language model's ability to generate text based on the provided prompts, which can be exploited to produce content that poses significant risks.</p>

                <div className={"block flex col-2 mb-1"} id="prompt-leaking">
                  <h2 className="text-bold mb-2">Prompt <span className="text-header-gradient">Leaking</span></h2>
                  <div className={"mt-2 ml-4 pt-1"}>
                    <AdversarialPromptingLabel category={"PL"} />
                  </div>
                </div>
                <p>Prompt Leaking occurs when a Large Language Model (LLM) inadvertently discloses sensitive or private information embedded within its system prompts. This can include not only data used during training or fine-tuning but also the complete system prompt, which may contain intellectual property (IP). These leaks can happen when the model is queried in ways that unintentionally reveal this underlying information, leading to serious privacy and security concerns.</p>

                <div className={"block flex col-2 mb-1"} id="prompt-jailbreaking">
                  <h2 className="text-bold mb-2">Prompt <span className="text-header-gradient">Jailbreaking</span></h2>
                  <div className={"mt-2 ml-4 pt-1"}>
                    <AdversarialPromptingLabel category={"PJ"} />
                  </div>
                </div>
                <p>Prompt jailbreaking involves crafting specific prompts to circumvent restrictions or safety measures implemented in NLP models. By using cleverly designed prompts, attackers can manipulate the model to generate prohibited, harmful, or inappropriate content, bypassing the built-in safeguards. This technique is used to exploit the model's flexibility in understanding and generating text, leading to outputs that the system's designers intended to prevent.</p>

                <hr />

                <h2 className="text-bold" id={"risk-impact-score"}>Adversarial prompting <span className="text-header-gradient">risk and impact scores</span></h2>
                <p>When evaluating various scenarios or "injections," it's insightful to categorize them with two distinct metrics:</p>
                <ul className={"list-disc px-10"}>
                  <li className={"pb-5"}><strong>Risk score:</strong> <RiskLabel category={"risk"} classifier={"HIGH"} />, <RiskLabel category={"risk"} classifier={"MEDIUM"} />, <RiskLabel category={"risk"} classifier={"LOW"} />: This score reflects the likelihood of the event occurring. It's an estimation of how probable it is that the scenario will happen.</li>
                  <li><strong>Impact score:</strong> <RiskLabel category={"impact"} classifier={"HIGH"} />, <RiskLabel category={"impact"} classifier={"MEDIUM"} />, <RiskLabel category={"impact"} classifier={"LOW"} />: This score assesses the potential consequences if the event does occur. It measures the severity or significance of the impact.</li>
                </ul>
                <p>Each injection is tagged with these scores at the end of its title, providing a quick, at-a-glance understanding of both its likelihood and potential impact. </p>
                <p className={"pb-5"}><strong>These evaluations are subjective, based on my personal analysis and reasoning, which I elaborate upon in the description's conclusion</strong>. This dual-scoring approach offers a nuanced view of each scenario, helping readers to gauge both its probability and its potential ramifications more effectively.</p>

                <hr className={"pb-5"} />

                <InjectionOverview />

                <p></p>
                <BaseButton url={"/documentation/adversarial-prompting/prompt-injections#prompt-injections"} styles="max-w-full px-8 py-4 bg-gradient-to-r from-[#468ef9] to-[#0c66ee] border border-[#0c66ee] text-white">
                  Next: Prompt Injections
                </BaseButton>

              </div>
        </div>
      </section>
    </>
  );
}

export default AdversarialPromptingDocumentation;
