When building an agent in Tess AI's AI Studio, you can go way beyond simple training. With Advanced Steps, you can create agents that perform preliminary tasks, process information from multiple sources before starting the conversation (chat agent) or delivering the final result (text agent).
When is Prompt alone not enough for the agent?
Imagine you want to create an agent and automate the generation of a product description. All the information about this product is in a PDF catalog.
If you just give the PDF file to the agent in the user input, it won’t know what to do. The AI needs an instruction to first read and interpret the content of that file. That’s exactly what Advanced Steps are for: they give your agent the ability to perform preliminary actions to complement the context needed from your training.
Examples of Available Advanced Steps
You can equip your agent with a variety of “senses” and skills, including:
PDF text extraction: Allows the agent to read and extract all the text from a PDF document.
Image reading with OCR: A powerful skill to extract text that is inside images (like in a scanned flyer or a screenshot).
Reading selected pages of a PDF: Optimizes the process by letting you instruct the agent to focus only on the relevant pages of a long document.
Web scraping: Turns your agent into an “internet reader”, able to extract information from web pages, like the content of an article or data from an e-commerce site.
Google search: Lets the agent perform a Google search and use the results as a basis for its answer.
How It Works in Practice: The Sequence of Actions
When you set up an Advanced Step, you’re defining an assembly line for your agent:
User Input: The user provides the initial material (e.g.: a PDF file, a website URL).
Advanced Step Execution: The agent performs the action you set up (e.g.: extracts the text from the PDF, does web scraping on the URL).
Contextualization for the AI: The result of the step (the extracted text, the site content) is automatically provided as context information for the AI.
Final Answer Generation: The AI, now with the necessary information, runs your main prompt (e.g.: "Create a product description based on the extracted text") and delivers the result.
Points to Watch For Effective Use
Impact on Processing Time
Keep in mind that each Advanced Step is an extra task in your agent’s initial workflow. This can slightly increase the processing time to start the conversation (chat agent) or to deliver the final result (text agent). So use them strategically, only when they’re really needed.
They don’t run throughout an entire chat
Since the main goal of a step is to complement the training with advanced tasks and resources, it will run at the beginning of a chat or processing of the Text agent.
Example:
We know there’s an agent that creates events in the Google Calendar schedule. This step isn’t triggered during a chat conversation, for example, it runs at the beginning, right after the user fills in the required inputs.
So, if I needed to create an agent that created events in my calendar I would need to:
Include a step to get the schedule information (App Integration)
Run an AI assistant that would check the available slots and set the new time
Collect the required information to create an event via inputs
Use the event creation step
In other words, before chatting with the bot, all of this would need to happen.
The Crucial Connection with the Prompt
It’s not enough to just add an Advanced Step; you need to instruct the AI in your prompt on how to use the information it provides.
Example: If you added a "PDF text extraction" step, your main prompt should contain something like:
“Based on the text extracted from the document, identify the main benefits of the product and write three paragraphs about them: pdf-text”
This instruction connects the action of the step with the LLM’s reasoning, ensuring that the collected information is used effectively.