You're reading for free via Obinna Onyema's Friend Link. Become a member to access the best of Medium.

Member-only story

How to Set Up Llama Agentic System with Llama 3.1 8B in Ollama

Obinna Onyema
4 min readSep 10, 2024

Photo by Dima Solomin on Unsplash

The llama agentic system allows you to use Meta’s llama stack to build apps with agentic workflow. They’ve continued to refine the repo in the last few weeks. I’ll walk you through setting it up on your own computer with what is available today.

My personal computer isn’t good enough for this demo so I set it up on AWS EC2 compute running a marketplace machine image (Galaxys Deep Learning Base GPU AMI (Ubuntu 20.04)) using a g4dn.xlarge machine with 4 CPUs and 1 GPU.

Not a Medium member? Click here to read the full story for free.

Get API Keys

I’m only going to be testing with brave search in the example. But if you get the hang of this and you can integrate your own tools, then you’ll need more API keys for your work.

Brave Search

I will need a Brave Search API key to run some of the example workflows in the repo. Go to https://brave.com/search/api/ , create an account and set up a free subscription. They will require a credit card even for the free subscription. Grab your API key once set up is complete.

Brave Search API keys screen

Set Up Llama Agentic System

I’m going to setup the Llama distribution using Ollama.

#Clone the repo to your computer
git clone https://github.com/meta-llama/llama-stack-apps.git

#Create conda environment
conda create -n agentic_env python=3.10

#Activate conda environment
Conda activate agentic_env

#Switch to the folder where repo is stored
cd llama-stack-apps

#install modules from requirements file
pip install -r requirements.txt

Installing and setting up the Ollama distribution

curl -fsSL https://ollama.com/install.sh | sh
Install Ollama
ollama pull llama3.1:8b-instruct-fp16
Downloading Llama3.1:8b-instruct-fp16 image for Ollama

Now run the llama 3.1 model and test it out a bit

ollama run llama3.1:8b-instruct-fp16

I asked it to write a poem about shawarma

Shawarma poetry from Llama 3.1

Exit Ollama with /bye. Now to configure the llama distribution, run the following code.

llama stack build local-ollama - name 8b-instruct

It will ask you a bunch of configuration questions. You can use the answers in the screenshot below as a guide. Apply your bravesearch API key when it asks for it.

Configuring the distribution for Ollama

Now launch the llama server.

llama stack run local-ollama - name 8b-instruct - port 5000
Ollama server up and running

Running example scripts

Now leave the server running. Then on another terminal, run this from the llama-stack folder and with your conda environment active:

python examples/scripts/vacation.py [::] 5000 - disable_safety

This example needs you to have the brave search API configured.

Vacation script running

If you look at the other terminal where the server is running, you should see some logs:

Server logs

If you get a “generator cancelled” error, it’s having trouble running Ollama. Use another terminal like in the shawarma poetry step to run Ollama so it “warms up”:

ollama run llama3.1:8b-instruct-fp16

References:

1. https://github.com/meta-llama/llama-stack-apps/issues/40

2. https://github.com/meta-llama/llama-stack-apps/issues/56

3. https://noblefilt.com/metas-llama-agentic-system/

Obinna Onyema
Obinna Onyema

Written by Obinna Onyema

Enjoying building solutions with data and AI. Machine Learning Researcher https://www.linkedin.com/in/obinna-onyema/

No responses yet

Write a response