MELIORATE: NLP techniques

Sunday, February 23, 2025

How AI Picks Its Words: Top P and K Unraveled!

1. Ever wondered how an AI decides what to say next? Two cool tricks it uses are called Top P and Top K. They’re like filters that help the AI choose words—whether it sticks to safe bets or gets a little wild. Let’s break them down with examples, no tech jargon needed!

Top P: The Probability Party

2. Suppose the AI is completing "The cat is ___" and has a list of word choices, each with a probability of being selected:

"soft" (40% probability)
"cute" (30% probability)
"lazy" (20% probability)
"sneaky" (5% probability)
"wild" (5% probability)

3. Top P (also referred to as nucleus sampling) states: "Only consider the smallest set of top words which cover, say, 80% of the total chance." Therefore:

With P = 0.8, it sums up the highest probabilities: "soft" (40%) + "cute" (30%) + "lazy" (20%) = 90%. That's enough to reach 80%, so it chooses randomly from only "soft," "cute," or "lazy." "Sneaky" and "wild" don't qualify.

4. Result? Perhaps "The cat is cute."

Range: Top P is a probability between 0 and 1 (imagine 0.1 to 0.95 in reality).

Low P (such as 0.3): Very fussy, only holds the blindingly obvious ("The cat is soft").

High P (such as 0.9): Braver, may allow "sneaky" to creep in.

It's as if saying to the AI, "Invite the trendy words to the party, but not enough to occupy 80% of the guest list!"

Top K: The VIP List

Top K now is easy. It simply takes the top K most probable words and chooses among them. Same configuration: "The cat is ___" with those choices.

When K = 3, it takes the first 3: "soft," "cute," "lazy." Then rolls the dice and selects one.
What happens? Maybe "The cat is lazy."

Range: Top K is an integer, typically 5 to 50 or thereabouts.

Small K (such as 5): Simple and straightforward.
Large K (such as 40): More choices, so it could say "The cat is wild" if "wild" makes the top 40.

Consider it the AI creating a VIP list: "Only the top 3 (or 10, or 50) get in!"

How They Compare

Top P is interested in percentages. It's adaptable—sometimes it selects 2 words, sometimes 5, depending on their probabilities summing up to P.

Top K is interested in a predetermined number. It's rigid—always K words, regardless of their probabilities.

Example in Action

"The sky is ___": Choices are "blue" (40%), "clear" (30%), "cloudy" (20%), "dark" (5%), "purple" (5%).

P = 0.7: Takes "blue" (40%) + "clear" (30%) = 70%. Selects from those. Perhaps "The sky is clear."

K = 2: Takes "blue" and "clear." Same pool this time, but it's always precisely 2. Perhaps "The sky is blue."

Why It Matters

These parameters adjust the amount of creativity or tedium the AI produces. Low P or K = serious and concentrated. High P or K = more surprises (some bizarre ones!). The next time you converse with an AI, think about it flipping through its word list using Top P or Top K to determine the atmosphere and when I keep getting such through internals I get full of excitement to read further...dive more...know more...aware more

Saturday, April 08, 2023

IS THERE ANY WATERMARKING TO IDENTIFY AI GENERATED TEXT?

With the rise of artificial intelligence (AI), there are growing concerns about the potential misuse of AI-generated text, such as the creation of fake news articles, fraudulent emails, or social media posts. To address these concerns, watermarking techniques can be used to identify the source of AI-generated text and detect any unauthorized modifications or tampering.Watermarking is a process of embedding a unique identifier into digital content that can be used to verify the authenticity and ownership of the content. For AI-generated text, watermarking can provide a means of identifying the source of the text and ensuring its integrity.

There are several watermarking techniques available for AI-generated text. Here are three examples:

Linguistic patterns: This technique involves embedding a unique pattern of words or phrases into the text that is specific to the AI model or dataset used to generate the text. The pattern can be detected using natural language processing (NLP) techniques and used to verify the source of the text.
Embedding metadata: This technique involves embedding metadata, such as the name of the AI model, the date and time of generation, and the source of the data used to train the model, into the text. This information can be used to verify the source of the text and identify the AI model used to generate it.
Invisible watermarking: This technique involves embedding a unique identifier into the text that is invisible to the human eye but can be detected using digital analysis tools. The watermark can be used to verify the source of the text and detect any modifications or tampering.

Overall, watermarking techniques for AI-generated text can provide a means of identifying the source of the text and detecting any unauthorized modifications or tampering. These techniques can be useful in addressing concerns about the potential misuse of AI-generated text and ensuring the authenticity and integrity of digital content.

In addition to watermarking techniques, there are other approaches that can be used to address concerns about the potential misuse of AI-generated text. For example, NLP techniques can be used to detect fake news articles or fraudulent emails, and AI models can be trained to identify and flag potentially harmful content.

MELIORATE

Social Icons

Pages

Research Gate & ORCID

RACKSPACE CERTIFIED

About Me

Followers

Search This Blog

Popular Posts

My Blog List

Sunday, February 23, 2025

How AI Picks Its Words: Top P and K Unraveled!

Saturday, April 08, 2023

IS THERE ANY WATERMARKING TO IDENTIFY AI GENERATED TEXT?

Visitants

Papers published

I'm an IndiBlogger Winner

Blog Archive

Labels

GOOGLE VERIFIED PROPERTY