Skip to content
_CORE
AI & Agentic Systems Core Information Systems Cloud & Platform Engineering Data Platform & Integration Security & Compliance QA, Testing & Observability IoT, Automation & Robotics Mobile & Digital Banking & Finance Insurance Public Administration Defense & Security Healthcare Energy & Utilities Telco & Media Manufacturing Logistics & E-commerce Retail & Loyalty
References Technologies Blog Know-how Tools
About Collaboration Careers
CS EN DE
Let's talk

The Complete Guide to Ollama + Local AI

26. 05. 2025 1 min read intermediate

The Complete Guide to Ollama + Local AI

Run AI models locally. No API keys, no fees, full control.

What is Ollama

Ollama = Docker for LLM models. Downloads, configures and runs AI models locally. Simple CLI + REST API.

Installation

macOS / Linux

curl -fsSL https://ollama.com/install.sh | sh

Run a model

ollama run llama3.2

Download a model

ollama pull nomic-embed-text

Available Models

  • llama3.2 (3B) — fast, good for chat
  • llama3.1 (8B/70B) — more powerful
  • mistral (7B) — good performance/speed ratio
  • codellama (7B/34B) — for code
  • nomic-embed-text — embeddings
  • qwen2.5vl — vision model

REST API

Generate

curl http://localhost:11434/api/generate -d ‘{“model”:”llama3.2”,”prompt”:”Hello”}’

Chat

curl http://localhost:11434/api/chat -d ‘{“model”:”llama3.2”,”messages”:[{“role”:”user”,”content”:”Hi”}]}’

Embeddings

curl http://localhost:11434/api/embeddings -d ‘{“model”:”nomic-embed-text”,”prompt”:”Hello world”}’

Python Integration

import ollama

response = ollama.chat(model=”llama3.2”, messages=[ {“role”: “user”, “content”: “Explain Docker in one sentence.”} ]) print(response[“message”][“content”])

Modelfile — Custom Model

FROM llama3.2 SYSTEM “You are a helpful coding assistant. Respond in English.” PARAMETER temperature 0.7

Hardware Requirements

  • 3B model: 4 GB RAM
  • 7B model: 8 GB RAM
  • 13B model: 16 GB RAM
  • 70B model: 48+ GB RAM
  • Apple Silicon: unified memory = ideal for local AI

Use Cases

  • Coding assistant (offline)
  • RAG (Retrieval Augmented Generation)
  • Document analysis
  • Embeddings for search
  • Experiments without API costs

Why Local AI

No API fees. No latency. Full control over your data. And with Apple Silicon it is surprisingly fast.

ollamaaillmlocal
Share:

CORE SYSTEMS team

We build core systems and AI agents that keep operations running. 15 years of experience with enterprise IT.