RHEL

https://docs.redhat.com/en/documentation/red_hat_enterprise_linux_ai/1.4

Timeline

1950 Alan Turing proposes the Turing test.

English mathematician Alan Turing publishes “Computing Machinery and Intelligence,” in which he asks: “Can machines think?”. In the paper, Turing proposes an experiment that is now known as the “Turing test” to assess the ability of a machine to exhibit intelligent behavior that is indistinguishable from that of a human.

In the Turing test, a human evaluator reviews text from a conversation between two participants: a human and a machine. A machine would pass the test if the human evaluator could not reliably tell the human participant from the machine.

Although the test has since been subject to criticism and debate, it remains an important milestone in computer science and AI.

1955 John McCarthy coins the term “artificial intelligence.”

In a proposal for the Dartmouth Summer Research Project on Artificial Intelligence to be held the following year (and which would kick off artificial intelligence as a field of study), American computer scientist John McCarthy and colleagues introduce the term “artificial intelligence.”

1956 Logic Theorist, “the first AI program,” proves mathematical theorems.

Allen Newell, Herbert A. Simon, and Cliff Shaw write Logic Theorist, considered by many to be the first program designed to conduct automated reasoning.

Logic Theorist proves theorems as well as a mathematician and successfully proves 38 of the first 52 theorems in chapter 2 of the Principia Mathematica.

1970 Linnainmaa publishes a reverse mode of automatic differentiation, which later becomes known as backpropagation.

In his master’s thesis, Finnish mathematician and computer scientist Seppo Linnainmaa introduces what becomes known as backpropagation, a gradient estimation method, which is later used to train neural network models.

1983 James F. Allen introduces Allen’s interval algebra, the first widely used formalization of temporal events.

American computational linguist James F. Allen invents interval algebra, a calculus used in temporal reasoning.

Mid-1980s Neural networks start to be used widely.

Neural networks, which use backpropagation algorithms to train themselves, become widely used in AI applications.

1995 Russell and Norvig publish Artificial Intelligence: A Modern Approach

Computer scientists Stuart Russell and Peter Norvig publish Artificial Intelligence: A Modern Approach. The book goes on to become the most commonly used AI textbook at colleges and universities (used at more than 1,500 institutions throughout the world). Multiple editions have been released in the intervening years, with the most recent version appearing in 2020.

1997 IBM’s Deep Blue beats then-world chess champion Garry Kasparov.

Although Garry Kasparov defeated Deep Blue in their first six-game match a year earlier, Deep Blue won a 1997 rematch against Kasparov, which was the first defeat of a reigning world chess champion by a computer under tournament conditions.

1997 Dragon Systems releases Dragon NaturallySpeaking 1.0, often referred to as the first consumer-grade commercial voice recognition product.

Although the origins of voice recognition software go back as far as the 1920s, it wasn’t until 1997 that Dragon Systems released NaturallySpeaking 1.0, as its first continuous dictation product.

2011 IBM Watson defeats Ken Jennings on Jeopardy!

IBM Watson plays against champions Brad Rutter and Ken Jennings on the TV quiz show Jeopardy! and defeats both human contestants with a final score of $77,140, more than three times the scores of Jennings ($24,000) and Rutter ($21,600). The $1 million prize IBM Watson won for defeating the quiz-show champions was donated to charity.

2016 DeepMind’s AlphaGo defeats world champion Go player.

DeepMind’s AlphaGo program, powered by a deep neural network, beats Lee Sodol, winner of 18 Go world titles, in a five-game match. Despite its set of simple rules, Go is an extremely complex board game, in that there are many more potential moves and board positions than in chess. DeepMind was later purchased by Google.

2020 OpenAI introduces GPT-3.

OpenAI introduces GPT-3, a language model that uses deep learning to generate computer codes, poetry, and other examples of written content that are exceptionally similar to, and almost indistinguishable, from those written by humans.

2022 The ChatGPT chatbot debuts.

Developed by OpenAI, ChatGPT is introduced in November. It is initially built on the GPT-3 LLM. By January of the following year, ChatGPT has more than 100 million users.

2023 OpenAI releases GPT-4.

In March 2023, OpenAI releases GPT-4, which is generally regarded as an improvement over GPT-3.5. GPT-4 is multimodal; users can input images as well as text. Later that month, Google releases Bard, its chatbot, which was based on the LaMDA and PaLM LLMs.

2023 Google releases the Gemini large language model.

In December 2023, Google releases the Gemini LLM, the successor to LaMDA and PaLM 2.

2024 Google releases Gemini 1.5 (in beta).

In February, Google releases Gemini 1.5, which can run up to 1 million tokens consistently. On the same day as Google’s announcement, OpenAI announces plans to release Sora, a generative AI model, which will be able to generate text-to-video content.

Key terms

Artificial intelligence (AI): Computer science processes and statistical algorithms that are able to simulate and augment human intelligence. In other words, AI describes systems capable of acquiring knowledge and applying insights to enable problem solving.

This term is primarily used by the business community.

Machine learning (ML): A subcategory of AI that uses algorithms to identify patterns and make predictions within a set of data. This data can consist of numbers, text, or even photos. Under ideal conditions, ML allows humans to interpret data more quickly and more accurately than they would ever be able to on their own. For instance, ML can be used to anticipate consumer buying patterns based on seasonal factors, website traffic, and so forth.

This term is primarily used by the technical community.

Deep learning: A specialized form of ML that teaches computers to process data using an algorithm inspired by the human brain. Deep learning teaches computers to learn through observation, imitating the way humans gain knowledge, and helps data scientists collect, analyze, and interpret large amounts of data. Also known as deep neural learning or deep neural networking.

Deep learning is typically used for human-like predictive use cases, such as forecasting the medical outlook of a patient based on lifestyle behaviors, genetic risk factors, or environmental conditions.

Machine learning operations (MLOps): A set of practices and principles that combine ML with software and DevOps methodologies to streamline and optimize the end-to-end process of training, developing, deploying, and maintaining AI/ML models. MLOps introduces automation and helps solve the challenge of maintaining the ML model’s accuracy as new data is ingested and keeps it up to date. The goal is to build AI-enabled applications powered by ML models that provide the highest prediction accuracy.
Data science: An interdisciplinary field that leverages mathematical, statistical, and computational techniques to extract knowledge and insights from structured and unstructured data. It encompasses various processes, from data collection and cleaning to analysis and visualization, ultimately driving decision making in a wide range of domains.
Generative AI: An AI technology that relies on deep learning models trained on large data sets to create new content. Generative AI models, which are used to generate new data, stand in contrast to discriminative AI models, which are used to sort data based on differences. Examples of generative AI include drafting a website or creating an image that is similar to an existing image.

It is important to understand the difference between generative AI and predictive AI.

Predictive AI: Predictive AI is one of the most common types of AI used in business applications. It is often compared to generative AI. Predictive AI has been in use for decades and predicts or forecasts outcomes based on historical data.

Predictive AI is a more mature technology than generative AI and is widely used in a variety of sectors. For example, predictive AI use cases include product recommendations on a retail website, forecasting credit risk or fraud in the financial services industry, and identifying which patients are at the most risk for certain illnesses in the medical sector.

Foundation model: A type of ML model pretrained to perform a range of tasks. Foundation models are programmed to function with a general contextual understanding of patterns, structures, and representations. This foundational comprehension of how to communicate and identify patterns creates a baseline of knowledge that can be modified, or fine tuned, to perform domain-specific tasks for just about any industry.

Two defining characteristics that enable foundation models to function are transfer learning and scale:

Transfer learning refers to the ability of a model to apply information about one situation to another and build upon its internal “knowledge.”
Scale refers to hardware—specifically, graphics processing units (GPUs)—that allow the model to perform multiple computations simultaneously, also known as parallel processing.

Large language model (LLM): A type of generative AI model that utilizes ML techniques to understand and generate human language. LLMs can be incredibly valuable for companies and organizations looking to automate and enhance various aspects of communication and data processing.
Model training: The initial phase of building the AI/ML model in which the model learns from a large dataset to understand patterns, relationships, and features in the data. Creating AI foundation models from scratch can be very resource- and time-intensive and is only in reach of only a few enterprise customers.
Model inferencing: The phase in which the trained AI/ML model is put to use and can make predictions, generate text, classify data, or perform any other task it was designed for. During inference, the trained model takes in new, unseen data and produces outputs based on its learned patterns.
Retrieval-augmented generation (RAG): An architectural pattern that enables AI foundation models to produce factually correct outputs for specialized or proprietary topics that were not part of the model’s training data. By augmenting users’ questions and prompts with relevant data retrieved from external data sources, RAG gives the model “new” (to the model) facts and details on which to base its response.
Fine-tuning: A technique that involves taking a pretrained generative AI model and further training it on a specific dataset or for a specific task. Fine-tuning requires a labeled dataset that is specific to the task to train the model with examples of input-output pairs related to that task.

Although fine-tuning requires significantly less data than the original training process, it still generally involves a large investment of resources and vast amounts of data. In general, fine-tuning is more involved and labor intensive than prompt-tuning.

Prompt-tuning: A technique used to optimize the prompts or instructions a user gives to an AI model to experiment with different prompt formats and wording to achieve the desired results. In some cases, prompt-tuning allows organizations to adapt models and achieve “good enough” accuracy but do it with less resources. Prompt-tuning is often contrasted with fine-tuning AI models, which tends to require more effort and resources.

AI and ML

Part of what makes the incredible growth in AI possible is Moore’s law, which states that the number of transistors in an integrated circuit doubles approximately every two years.
The market for AI-related spending is large and only going to grow larger within the next few years.
AI is a branch of computer science that enables machines to perform tasks that typically require human intelligence.
Machine learning (ML) is a subcategory of AI that uses algorithms to identify patterns and make predictions within a set of data.

Predictive AI

Predictive AI is a common type of artificial intelligence system used in business applications that predicts or forecasts outcomes based on historical data.
Predictive AI is an integral part of many everyday activities such as conducting web searches, texting, shopping online, and engaging with video and music streaming services.
Data science is an interdisciplinary field that leverages mathematical, statistical, and computational techniques to extract knowledge and insights from structured and unstructured data.
Some of the tasks data scientists perform include data collection, data cleansing, model selection and training, and evaluation and validation of those models.
Examples of enterprises using predictive AI include logistic companies employing it to optimize delivery routes and prevent package theft and banks or other financial institutions using it to identify fraud, money laundering, and other financial crimes.

Generative AI

The Turing test is a well-known experiment designed to assess the ability of a machine to exhibit intelligent behavior that is indistinguishable from that of a human. In the Turing test, a human evaluator reviews text from a conversation between two participants: a human and a machine. A machine is said to pass the test if the evaluator cannot reliably tell the human participant from the machine.
Generative AI is AI technology that relies on deep learning models trained on large data sets to create new content.
Deep learning is a specialized form of machine learning that teaches computers to process data by using an algorithm inspired by the human brain and neural networks. Deep learning teaches computers to learn through observation, imitating the way humans gain knowledge.
Some examples of generative AI include AI-generated summaries of customer reviews on Amazon, chatbots such as ChatGPT generating content based on a text prompt, AI-generated highlight reels from sporting events, and Red Hat’s KCS Solution Summaries.
A large language model (LLM) is a type of AI program designed to understand and generate human language.
Creating an LLM from scratch requires a tremendous amount of money, expertise, and resources. For all but a handful of organizations, creating such a model is out of reach; most organizations are likely to start with a foundation model.
There are a variety of approaches an organization might use when working with a foundation model. One of the most popular approaches is retrieval-augmented generation (RAG), a method that involves getting better answers from a generative AI application by linking an LLM to an external resource.
There are several issues associated with LLMs, including massive energy and resource costs, a lack of transparency with regard to how models are trained, biases in model training, hallucinations, data recency, and issues related to copyright and privacy.

LLMs

Generative AI system designed to understand and generate human language

Challenges:

outdated knowledge: costly to train models
lack of domain expertise
lack of data transparency
- can lead to ethical and bias concerns
false information (hallucinations)
cost: massive resource and energy consumption

RHEL AI

Foundation model platform for generative AI and LLM use cases. Designed to be a single server appliance offering:

Granite family LLMs subset
- Specialized business use cases
InstructLab enhances LLMs
In a bootable container with full support
Model IP indemnification

RHEL Image Mode Granite Family Models InstructLab

Challenges with AI:

Model cost
Alignment complexity - hard to tailor llms for non-data scientists
Deployment constraints
- tuning and service models everywhere you data lives is difficult

Image mode

Packaging format that they deliver RHEL AI

leverage containerized app workflows
immuatable except /var /etc/ /home
Follow OCI container standards
Atomic upgrades
no DNF

InstructLab

Tooling framework that contributes and fine-tunes LLMs that doesn’t require a team of data scientists

Prompt engineering: getting the prompt optimized to get the best output from model
- best data is directly in the model
RAG: retrieval augmented generation, retrieve info from a separate DB
- added directly into the models prompt
- RAG gets info you need and injects it into the model
Fine tuning: adjust pre-trained model’s parameters

Based on LAB methodology:

Large-scale Alignment for chatBots
uses a taxonomy file structure, synthetic data generation, and multiphased fine tuning process

Put data in tax-file structure
teacher model creates synthetic data set based on tax file
this acts as the critic model and makes sure its refined
this data set is used for fine-tuning

Granite models

FOSS with Apache 2.0

smaller and faster, good for small businesses and $$$ effective
transparent bc its FOSS

You serve the model, then chat with it from the command line

if there was a hallucination, you can train the model on the correct information
you add in a knowledge file to train the model

You can submit knowledge and skill data directly to a LLM

use a YAML file
- context, question and answers
- create the yaml file in a dir in either the compositional_skills, foundational_skills, or knowledge dirs. Ex: /knowledge/companies//qna.yaml

Synthetic data

Uses LLM as a teacher model to create a new dataset.

uses the qna.yaml file to create synthetic dataset

Training workflow

Multiphase training workflow:

training can take days

Practical approach to RHEL AI

Instructlab

FOSS community project on AI model alignment or fine tuning.

Serves models, chat interface, evaluate the responses to query

Community version (SDG 1.0), can run on consumer laptop
RHEL version, enterprise, requires high-powered GPUs

Tool that provides a way to generate synthetic

Uses up to 4 AI models:

Student model
Teacher model
Red Hat LoRA Layer
Evaluation Model

ilab            # main command

$HOME/.cache/instructlab/models                     # downloaded llms, including saved and generated
$HOME/.local/share/instructlab/datasets             # output from SDG phase
$HOME/.local/share/instructlab/taxonomy             # skill and knowledge data
$HOME/.local/share/instructlab/checkpoints          # output of training process
$HOME/.local/share/instructlab/config.yaml          # config file

Hardware reqs

accelerator vendor
hardware or cloud vendor
CPU arch
Red Hat Enterprise Linux AI (RHEL AI) is a generative AI model alignment and inference server designed for hardware capable of fine-tuning full-size AI models, generating synthetic data, and integrating open-source models with new information.
Accelerators: GPUs (Nvidia, AMD, and Intel) power generative AI training and inference features of RHEL AI. Each server can support up to eight accelerators
Hardware vendors: Dell and Lenovo are certified hardware vendors for RHEL AI. Solutions from Cisco are in validation, and Hewlett Packard is on the roadmap.
Cloud environments: Cloud environments compatible with RHEL AI include AWS and IBM Cloud (generally available), Azure and Google Cloud (currently in technology preview). All cloud providers should be generally available by the release of RHEL AI version 1.3. Marketplace offerings are expected to follow.
InstructLab software: The main component running on the RHEL OS is the InstructLab software, which includes Python 3.11, the InstructLab command line interface (CLI), the LAB enhanced method of synthetic data generation (SDG), and the LAB enhanced method of single- and multi-phase training.
InstructLab with vLLM: A high-input inference and serving engine for Large Language models (LLMs).
InstructLab with DeepSpeed: Hardware optimization software that speeds up the training process, similar to FSDP.
InstructLab with FSDP: Training framework that makes training faster and more efficient, similar to DeepSpeed.
Sample taxonomy tree: RHEL AI version 1.2 includes a sample taxonomy tree with example skills and knowledge that can be downloaded and used for training a model.
Operating system image and installers: RHEL AI operating system image and installers are essential for setting up and running RHEL AI.

# hardware validation commands
nvidia-smi
nproc
lscpu

Containers in RHEL AI bootable image

Red Hat Enterprise Linux 9.4: An image mode bootable container of the RHEL version 9.4 operating system (OS) for your machine.
The InstructLab container: Contains InstructLab and various other tools required for RHEL AI, including:
- Python version 3.11, used internally by InstructLab.
- InstructLab tools:
  - InstructLab command line interface (CLI).
  - LAB-enhanced method of synthetic data generation (SDG).
  - LAB-enhanced method of single- and multi-phase training.
InstructLab with vLLM: A high-input inference and serving engine for large language models (LLMs).
InstructLab with DeepSpeed: A hardware optimization software that speeds up the training process, similar to FSDP.
InstructLab with FSDP: A training framework that makes training faster and more efficient, similar to DeepSpeed.
RHEL AI version 1.2 also includes a sample taxonomy tree with example skills and knowledge that you can download and use for training a model.

Questions

What is a granite model
- Base model - optimized for training
- Granite model for inference serving, optimized for chat
- Preview of v2 model for inference serving
- LAB fine-tuned Granite core model

https://training-lms.redhat.com/player/play?in_sessionid=10420A82A2533924&classroom_id=74397894