You're reading for free via Obinna Onyema's Friend Link. Become a member to access the best of Medium.
Member-only story
How to Set Up Llama Agentic System with Llama 3.1 8B in Ollama

The llama agentic system allows you to use Meta’s llama stack to build apps with agentic workflow. They’ve continued to refine the repo in the last few weeks. I’ll walk you through setting it up on your own computer with what is available today.
My personal computer isn’t good enough for this demo so I set it up on AWS EC2 compute running a marketplace machine image (Galaxys Deep Learning Base GPU AMI (Ubuntu 20.04)) using a g4dn.xlarge machine with 4 CPUs and 1 GPU.
Not a Medium member? Click here to read the full story for free.
Get API Keys
I’m only going to be testing with brave search in the example. But if you get the hang of this and you can integrate your own tools, then you’ll need more API keys for your work.
Brave Search
I will need a Brave Search API key to run some of the example workflows in the repo. Go to https://brave.com/search/api/ , create an account and set up a free subscription. They will require a credit card even for the free subscription. Grab your API key once set up is complete.

Set Up Llama Agentic System
I’m going to setup the Llama distribution using Ollama.
#Clone the repo to your computer
git clone https://github.com/meta-llama/llama-stack-apps.git
#Create conda environment
conda create -n agentic_env python=3.10
#Activate conda environment
Conda activate agentic_env
#Switch to the folder where repo is stored
cd llama-stack-apps
#install modules from requirements file
pip install -r requirements.txt
Installing and setting up the Ollama distribution
curl -fsSL https://ollama.com/install.sh | sh

ollama pull llama3.1:8b-instruct-fp16

Now run the llama 3.1 model and test it out a bit
ollama run llama3.1:8b-instruct-fp16
I asked it to write a poem about shawarma

Exit Ollama with /bye. Now to configure the llama distribution, run the following code.
llama stack build local-ollama - name 8b-instruct
It will ask you a bunch of configuration questions. You can use the answers in the screenshot below as a guide. Apply your bravesearch API key when it asks for it.

Now launch the llama server.
llama stack run local-ollama - name 8b-instruct - port 5000

Running example scripts
Now leave the server running. Then on another terminal, run this from the llama-stack folder and with your conda environment active:
python examples/scripts/vacation.py [::] 5000 - disable_safety
This example needs you to have the brave search API configured.

If you look at the other terminal where the server is running, you should see some logs:

If you get a “generator cancelled” error, it’s having trouble running Ollama. Use another terminal like in the shawarma poetry step to run Ollama so it “warms up”:
ollama run llama3.1:8b-instruct-fp16
References:
1. https://github.com/meta-llama/llama-stack-apps/issues/40