New Series: Mastering AI Workflow

Mastering Gemini Gems

In my first post, I talked about how I use Gemini Gems in my professional and personal life to boost productivity and efficiency. I promised to dive deeper into Gems, and today, I’m delivering on that.

What Exactly Are "Gems"?

Gems are customizable, pre-programmable, and context-specific chatbots. Once you set the context and "program" them with instructions, you can use them repeatedly without re-prompting or re-context setting.

Currently, when you go to Gemini Gems, you will see two different types. You will see "New Gems" or AI Apps at the top, and then, after scrolling down,

Screenshot of Gemini Gems UI showing New Gems/AI Apps at the top.

you will find the Gem Manager with the "old" Gems listed under it. (If you are accessing this for the first time, you won’t see anything there yet, as shown in the screenshot below).

Screenshot of the Gem Manager showing a blank state for first-time users.

New Gems / AI Apps are a fantastic new addition! But we will save those for another blog post. In this post, we are focusing on the original Gems.

So, what is a Gem? Think of it this way: In a standard Google Gemini chat which you can give a context, persona, objective etc. only once and then this gem can make copies of itself for you for any time you need it in the future. But then you will ask, 'why it makes copies and why can't it just be in one window?'... Well it can do that as well if you need it or have copies also ready. Why is need of copies? Well because, every chat has a context window. Eventually, the AI is bound to lose the thread of that persona as the conversation gets longer because it exceeds its immediate memory capacity. And Also using the same chat window for multiple projects is bound to create issues. This is why it’s better to use Gems—but what are the other advantages?

The "Gem" Advantage: Efficiency at Scale

Why use a Gem instead of just starting a new chat? There are several major benefits:

Context Persistence: With a Gem, the persona is baked into the foundation. You don’t have to copy-paste your "expert persona" prompts every time you start a new session.
Organization & Project Management: If you’re like me and have hundreds of chats open, Gems act as a filing system. If I have a "Secondary Research Gem," all my various project sessions using that expert persona are grouped together. It saves you from digging through an endless chat history to find exactly where you left off.
Understand the Context Window: Gems use Gemini models with a 1 million to 2 million token window, allowing them to remember entire company libraries at once.
Living Document Connectivity: Unlike other AIs, Gems stay linked to your Google Docs and Sheets. If you update a file, the Gem sees that change automatically.
Reducing the Chaos: If you need the same role or persona for 10 different projects, using just one window for all of them creates unwanted chaos. Having dedicated sessions grouped under one Gem keeps your work organized.

The Gem Creation Process

When you go into the Gem Manager to create a new Gem, you’ll see a specific structure. You can provide a Name and a Description. Under the description, you have Instructions, and below that, you have Default Tools and Knowledge. (The "Default Tool" functionality was limited before, but Google has recently expanded this).

Screenshot showing Name, Description, Instructions, and Knowledge structure.

On the right-hand column, you have a Preview window. In the Instructions section, you’ll see a tooltip. When you click it, it asks: "What are your Gem’s main objectives and capabilities, and what style of response do you want?" If you click the "Learn More" call-to-action (CTA), it takes you to Google Support, which explains how to write great instructions. Google suggests four main areas to consider—the same as writing a good prompt: Persona, Task, Context, and Format.

Screenshot of the Instruction tooltip and Google Support CTA.

The Secret Sauce: Frameworks for Gem Creation

The "Garbage In, Garbage Out" rule applies heavily here. A simple instruction like "Research ice cream" will give you a mediocre result. A structured instruction changes everything. While Google suggests the PTCF framework, I’ve been experimenting with four different methodologies:

PTCF
Persona, Task, Context, Format

ROADS
Role, Objective, Details, Examples, Style

RTF-AFR
Role, Task, Style, Algorithm, Format, Restrictions

PECTQCC-O
Persona, Expertise, Core Mission, Tone, Quality Rules,constrains, citing, Output

"It is indeed funny though, if Google has given a specific framework to write instructions then how come all other frameworks are out there? And are they really need? My thoughts are... Some of these frameworks have unique strengths and might be better suited for particular tasks. Some of these frameworks I have found through learning via other Gem tinkerers. One framework i found while I was asking Gemini itself to tell me what framework works best for Gemini Gems. Which is which.... I’ll be covering those comparisons in my next post."

What happens if you don't use a framework? In my experience, it can lead to unexpected behavior. For example, if you ask a "Web Developer" Gem to help create a website, it might unexpectedly cut some of your content even if that wasn't the task. While a framework isn't a magic bullet, it significantly reduces hallucinations. For instance, if you provide a 100-page knowledge base and give clear instructions to rely only on that data while turning "Grounding Search" off, the Gem will be much more accurate.

Case Study: The Accessibility Auditor Gem

I recently built an Accessibility Auditor Gem using the PECTQCC-O format:

Persona: A Principal Experience Consultant with 25 years of experience.
Expertise: WCAG 2.1 guidelines and digital audits for mobile/desktop.
Core Mission: Help design and non-design teams assess and audit work for digital accessibility, generate error reports, and suggest resolutions.
Tone: Friendly but objective and data-driven.
Knowledge Tool: I uploaded the official WCAG 2.1 documentation so the Gem stays grounded in facts.

The Goal: I want to upload a screenshot and have the Gem scan it for errors and mark them directly on top of the original image (e.g., flagging color contrast or heading order issues) using Gemini’s multimodal capabilities.

The Reality Check: While it works, it isn't perfect yet. Sometimes the visual markings don’t render as requested. Interestingly, Google’s new AI Apps allow for "chaining" instructions and multiple tools in sequence. I suspect my Accessibility Auditor will work much more effectively as an AI App, which I am currently experimenting with.