Beginner's Guide to Llama Models - Global Intelligence and Insight Platform: IT Innovation, ETF Investment, plus Health Wellbeing

This guide is for you if you are new to Llama, a free and open-source large language model. You will find some basic information and common questions.

What is Llama?

Llama (Large Language Model Meta AI) is a family of large language models (LLM). It is Meta (Facebook)’s answer to ChatGPT.

But the two company takes different paths. ChatGPT is proprietary. You don’t know the code of the model, the training data, and the training method. Llama is an open-source software. The code, training data, and the training code are out there in the public.

Llama is the first major open-source large language model. It gains instant popularity upon release. In addition to being free and open-source, it is pretty small and can be run on a personal computer. The 7-billion and 13-billion parameter models are very usable on a good consumer-grade PC.

How does Llama work?

LLama is an AI model designed to predict the next word. You can think of it as a glorified autocomplete. It is trained with text from the internet and other public dataset. Llama 2 is trained with about 2 trillion words.

You may wonder why the Llama model seems to be intelligent: It gives you sensible answers to difficult questions. It can rewrite your essay. It can give you pros and cons of certain things.

The training text was written by humans. In some sense, they are a slice of human thoughts projected on a medium. By learning how to complete a sentence, the model also learns an aspect of being human.

Does the Llama model know logic? There are two opposing views. One view is no because what the model designed to learn was correlation. It just predicts the next most probable word. Nothing more. The other view is yes. Suppose the training text is a murder story. It must learn to complete the last sentence, “The murderer is”. To predict the next word accurately, it has no choice but to learn logical deduction.

Why use LLama instead of ChatGPT?

ChatGPT is zero setup. A free version is available. Why use LLama? ChatGPT is indeed highly accessible. Here are the reasons why

Privacy. You can use Llama locally on your own computer. You don’t need to worry about the questions you asked being stored in a company’s server indefinitely.
Confidentiality. You may not be able to use ChatGPT for work-related queries because you are bounded by a non-disclosure agreement. You don’t have an NDA with OpenAI, after all.
Customization. There are many locally finetuned models you can choose from. If you don’t like the answers of a model, you can switch to another one.
Train your model. Finally, you have an opportunity to train your own model using techniques such as LoRA.

What can you do with Llama models?

You can use Llama models the same ways you use ChatGPT.

Chat. Just ask questions about things you want to know.
Coding. Ask for a short program to do something in a specific computer language.
Outlines. Giving an outline of certain technical topics.
Creative writing. Let the model write a story for you.
Information extraction. Summarize an essay. Ask specific questions about an essay.
Rewrite. Write your paragraph in a different tone and style.

What language does Llama support?

Mostly English. The training data is 90% English.

Other languages, including German, French, Chinese, Spanish, Dutch, Italian, Japanese, Polish, Portuguese, and others. But don’t count on them.

This means you shouldn’t use Llama for translation tasks.

What computer hardware do I need?

It depends on the model size. The following are the VRAM needed for running on a GPU card with a GPTQ model.

Model	8-bit	4-bit
7B	10 GB	6 GB
13B	20 GB	10 GB
30 GB	40 GB	20 GB
70 GB	80 GB	40 GB

GPU VRAM requirement.

And the followings are for GGML models. (for Mac or CPU on Windows or Linux)

Model	4-bit qantized
7B	4 GB
13B	8 GB
30 GB	20 GB
70 GB	39 GB

RAM requirement.

What are quantized models?

Quantization is a method to reduce the models’ size while preserving quality. The benefit to you is the smaller size in your hard drive and requires less RAM to run.

What are the different versions of Llama?

Official models

There are two versions of the official models released by Meta — Llama 1 and Llama 2.

Llama 1

Llama 1 came out in February 2023. This release caused a big excitement because it was the first important LLM that was open-source. It was a big surprise back then, but now it seems like it was a long time ago. Llama 1 has spurred many efforts to fine-tune and optimize the model to run it locally. It was initially thought to be impossible to run a LLM locally. It was solved in a short period of time by hobbyists.

Llama 2

Although holding great promise, Llama 1 was released with a license that does not allow commercial use. This has limited the adoption of the Llama 1 model.

LLama 2 came out in July 2023. There are some incremental improvements in training and model architecture. The most significant change is the license term. Llama 2 is now free for commercial use. It is widely expected that this will spark a new round of development like what happened with Stable Diffuison.

Fine-tuned models

Unlike ChatGPT, you can make your own Llama model if you are unhappy with its response. You do that by teaching it with additional data. This is called fine-tuning.

Here are some popular fine-tuned models.

WizardLM

WizardLM is a family of models fine-tuned with many instruction-following conversations. The novelty of this model is using an LLM to generate training data automatically.

Download links

Model	Base model	Download links
WizardLM 7B uncensored	Llama 1	GPTQ, ggml
WizardLM 13B V1.1	LLama 1	GPTQ, ggml
WizardLM 30B V1.0	LLama 1	GPTQ, ggml

WizardLM models

Vicuna

Vicuna is fine-tuned with ChatGPT conversations.

Model	Base model	Download links
Vicuna 7B v1.3	Llama 1	GPTQ, ggml
Vicuna 13B v1.3	LLama 1	GPTQ, ggml
Vicuna 30B v1.3	LLama 1	GPTQ, ggml

Vicuna models

How to compare the performance of models?

There are so many models to choose from. How do you know which is the best, whatever that means? How to compare the Llama models with ChatGPT?

LMSYS hosts a leadership board to compare the performance of LLMs, including proprietary ones like ChatGPT. They measure 3 metrics:

Chatbot Arena: The answers of two LLMs are presented to users blindly and let users pick the better one. A ranking score is then calculated for each LLM.
MT-bench: Use GPT-4 to judge the answers LLM (This metric favors GPT models.).
Massive Multitask Language Understanding (MMLU): Test the LLM in 57 tasks, including elementary mathematics, US history, computer science, law, and more.

Which file format should I use?

If you have an Nvidia GPU card, the GPTQ format gives you the best performance.

If you use Mac, Windows without GPU, or Linux without GPU, use the GGML format.

How to install Llama models?

See the installation guide for Windows and the installation guide for Mac.

What is the software to use Llama?

Text-generation-webui is a graphical user interface for using Llama model. It is powerful and easy to use. I recommend this software for general users.

If you prefer a text-only experience and is comfortable with using Terminals, llama.cpp is a good choice.

Can I use Llama commercially?

No for Llama 1.

Yes for Llama 2.

Beginner’s guide to Llama models

Radar Trends to Watch: April 2023

Enjoyed this article? Sign up for our newsletter to receive regular insights and stay connected.

Beginner’s Guide to Llama Models

What is Llama?

How does Llama work?

Why use LLama instead of ChatGPT?

What can you do with Llama models?

What language does Llama support?

What computer hardware do I need?

What are quantized models?

What are the different versions of Llama?

Official models

Llama 1

Llama 2

Fine-tuned models

WizardLM

Vicuna

How to compare the performance of models?

Which file format should I use?

How to install Llama models?

What is the software to use Llama?

Can I use Llama commercially?

Like this:

Related

What is Llama?

How does Llama work?

Why use LLama instead of ChatGPT?

What can you do with Llama models?

What language does Llama support?

What computer hardware do I need?

What are quantized models?

What are the different versions of Llama?

Official models

Llama 1

Llama 2

Fine-tuned models

WizardLM

Vicuna

How to compare the performance of models?

Which file format should I use?

How to install Llama models?

What is the software to use Llama?

Can I use Llama commercially?

Share this:

Like this:

Related

Discover more from Global Intelligence and Insight Platform: IT Innovation, ETF Investment, plus Health Wellbeing