Introduction
In this tutorial, you will learn to combine Shinkai tools to create an AI tool that extracts the text content of a .pptx presentation, generates a text for a lesson about the presentation, and generates an audio file of this lesson. This tool is available in the Shinkai AI Store. You will learn how to :- build a tool and add features step by step using the Shinkai AI-assisted tool builder.
- combine Shinkai tools efficiently (optional features, customizability, config validation, error handling)
- implement Optical Character Recognition
- implement text to speech
- use the created tool
Full Code
- the LLM works on smaller instructions and is less likely to get confused, leading to a better implementation of your instructions
- if needed, prompting the LLM to edit, fix, improve the code generated so far is faster and cheaper because there is less code to interpret and regenerate each time (compared to editing a full code for the entire tool)
Prerequisites
To follow this tutorial, you will need :- the latest version of Shinkai Desktop installed
- to install Tesseract for OCR
- to install the ElevenLabs text-to-speech tool from the Shinkai AI Store and configure it
- an ElevenLabs API key with sufficient credits
Part 0 : Trying to build the full tool in one go with Shinkai AI-assisted tool builder
You can try to build a working prototype of the full tool using 1 detailed prompt and a performant LLM. In the tool creation UI, select a performant LLM (e.g. gpt_4_1, shinkai_free_trial), select Python, activate the 2 tools ‘shinkai_llm_prompt_processor’ and ‘eleven_labs_text_to_speech’, write a prompt describing the tool well, and execute it. For a good result, your prompt should be detailed and clearly describe :- the goal of the tool to create and its steps
- how each of the selected tools should be used
- what you would want in configuration versus inputs
- which feature should be optional
- how to handle errors

Part 1 : Extracting the text content from a .pptx file
You can try to build the content extraction feature first using the AI assistance, and then later on add the other features. To do so, do not select any tool as this feature does not rely on any, and use a good prompt. Because the prompt would be short as it is about just one feature, you can make it very thorough and add details on how to build the tool without risking to overwhelm the LLM. Here is an example prompt to create a tool that extracts the text content from a .pptx file :
- imports libraries and the shinkai tools needed (e.g. Tesseract, ElevenLabs)
- defines the configuration for the Tesseract executable path
- creates the output class for the content, error messages and status
- define 3 functions to extract text from text blocks, tables and charts
- creates a function to extract the content from shapes using the functions defined above, plus use Tesseract OCR for picture shapes
- defines a function to read the presentation from either URL or local file path
- creates a function that applies the content extraction slide by slide and shape by shape
- implement a validation function to stop the tool and log errors if there are issues with the Tesseract OCR implementation
- defines a run function using all the functions defined above.
- includes a step to check if the extracted content is empty or not. It’s a useful step because later the tool will use this extracted content to generate a lesson text, and this check will ensure there is a content, and stop the tool and inform the user if there isn’t, saving compute and time.
Part 2 : Adding a LLM prompt processor to generate a lesson text
Now you can use the AI-assisted tool builder to add a step that generates the lesson text, using the slides content extracted in the first step. To do so, activate the tool ‘shinkai_llm_prompt_processor’, and use a prompt similar to this one :
- adds to the imports the ‘shinkai_llm_prompt_processor’ tool
- adds an input for additional instructions to generate the lesson text, so that the user can customize it. Set default to ‘none’.
- adds an output for the generated lesson
- adds a step to defined a detailed prompt to generate optimal lesson text. Give some context describing the type of content the LLM will use and its specificities. Include formatting instructions. Include the optional additional instructions coming from the user. Organise it well and use tags to make things clear for the LLM.
- calls the LLM prompt processor tool using the prompt defined above.
- cleans the text generated lesson text from special characters, in case the LLM includes some despite our prompt format instructions
- includes the cleaned generated text in the output.
Part 3 : Adding an optional text-to-speech feature to create an audio file of the lesson
Now you can use the AI-assisted tool builder to add a final optional step which generates an audio file of the cleaned lesson text generated by the 2nd feature of the tool. To do so, activate the tool ‘eleven_labs_text_to_speech’, and use a prompt similar to this one :
- adds the ‘eleven_labs_text_to_speech’ tool to the import, and also ‘shutil’ (used for file operations).
- adds to the the config the option to generate the audio
- adds to the output the optional audio file
- defines a function to get the name of the .pptx file. It will be used to save the audio file with the same name.
- adds a step to the validate_config function to also check the configuration of the optional audio generation.
- adds a step to the run functions to call the ‘eleven_labs_text_to_speech’ tool. This step is optional according to the configuration.
- adds a step to change the name of the audio file generated by the text-to-speech tool : make it more user-friendly by simply using the name of the original .pptx file.
- includes an error message if the audio file generation failed.
Part 4 : Troubleshooting
If the tool created or modified by the AI assistance generates errors when you run it, consider these steps:- Provide Feedback : Copy the error message and the relevant code snippet back into the AI tool builder chat. Explain what input caused the error and ask the AI to fix it.
- Use a More Capable LLM : Some LLMs are better at coding tasks than others. If you’re using a less capable model, try switching to one known for stronger coding abilities.
- Refine Your Prompts : Make your instructions even more specific. Break down complex requests into smaller sub-tasks. Clearly define expected inputs, outputs, and error conditions for each part.
- Isolate the Problem : If the multi-step tool fails, try running only the first step (e.g., text extraction) by commenting out later steps or using a simpler version of the tool. Once the first step works, incrementally add back the next steps until you find where the error occurs.
- Examine Intermediate Outputs : Modify the code temporarily to print or output intermediate results (like the raw extracted text before the LLM call, or the LLM output before cleaning/TTS) to see if the data looks as expected at each stage.
- Seek Community Support : For additional help, contact the Shinkai support team or join the Shinkai community on Discord to ask questions and share your problem.
Part 5 : Perfecting your tools combination : useful prompts
For complex tools that chain multiple steps and call other tools, careful design is crucial for usability, reliability, and maintainability. Here are common areas for refinement and example prompts you can use with the AI tool builder to improve your PPTX-to-audio tool : Changing Configurations to Inputs : Decide carefully what should be a fixed setting (config) versus a per-run choice (input). Things that change often belong in inputs.Part 6 : Improving the metadata of the tool
Shinkai automates tool metadata generation, but you can enhance it. Good tool metadata should include :- an explicit tool title
- a thorough description (features, options, requirements, extra information)
- explicit descriptions for configurations and inputs
- adequate usable keywords to trigger the tool

Part 7 : Using the tool ‘PPTX Content Extractor With OCR And Audio Lesson Generator’
7.1 Installing extra components and setting up configurations
Install Tesseract for OCR, and set its executable path in the configuration of the ‘PPTX Content Extractor With OCR And Audio Lesson Generator’ tool. Install the ‘eleven_labs_text_to_speech’ tool from the Shinkai AI Store. Get an ElevenLabs API key with some credits. Go to the configuration tab of this ElevenLabs Shinkai tool and set your API key and pick a voice. Set audio generation to ‘yes’ or ‘no’ in the configuration of the ‘PPTX Content Extractor With OCR And Audio Lesson Generator’.7.2 Usage examples
To generate an audio lesson from a .pptx file, set audio generation to ‘yes’ in the configuration and include the filename in your prompt.

