A (very) basic guide to artificial intelligence

english.janatakhabar.inMarch 12, 2024

1,075

Intelligence is the capacity of living beings to apply what they know to solve problems. ‘Artificial intelligence’ (AI) is intelligence in a machine. There is currently no one definition of AI.

A simple place to begin is with AI’s materiality, as a machine-software combination.

What does the machine do?

A simple example-problem in AI is linear separability. You plot some points on a graph and then find a way to draw a straight line through the graph such that it divides the points into two distinct groups.

Let’s make this problem more abstract. For example, how would a machine differentiate between a cat and a dog?

Say you give the machine 1,000 pictures of cats and 1,000 pictures of dogs, and ask it to separate them. (This task is usually not given to a linear classifier but it illustrates a point.) You also equip the machine with tools — say, a camera and an app that can measure distances of different parts of an image, can analyse depth (using trigonometry), and can assess colours.

The machine can proceed by classifying the cat- and dog-pictures in different ways, say, by shape of the face, shape of the eyes, shape of the paw, body size, size of the tongue, fur colours, etc. Because the machine has the necessary computing power, it can plot these features two at a time on a graph. For example, the x-axis can represent the slope of the face and the y-axis the length of the paw. Or it can plot them three at a time in a 3D graph.

In all these cases, you watch until the machine has found a way to separate the pictures into two groups such that one group is mostly cats and the other is mostly dogs. At this point, you stop the machine.

How hard is decision-making?

Sometimes, it’s very easy to separate a given dataset into two pieces, like with the marbles, where you can make very reliable decisions with just one dimension, or parameter. Sometimes it’s more difficult, like with the cats and dogs, where you may need around a dozen parameters.

Sometimes it is difficult — like asking the computer on a driverless car to determine whether it should apply the brake based on how fast a bird is flying in front of the car. The set of outcomes on one side of the line stand for ‘no’ and the outcomes on the other side stand for ‘yes’ — and solving for this will require hundreds of parameters.

They will also have to account for the context of decision-making. For example, if the person in the car is in a hurry to get to a hospital, is killing the bird okay? Or if the person in the car is not in a hurry, how quickly should the car brake? Etc.

Sometimes it’s just mind-boggling. For example, ChatGPT is able to accept an input question from a user, make ‘sense’ of it, and answer accordingly. This ‘sense’ comes from its training corpus — the billions of sequences of words and sentences scraped from the internet.

In particular, ChatGPT learnt not by classifying words but by predicting the next word in a given sentence. More particularly, large language models (LLMs) like ChatGPT generate the text response without classifying it or relating the question to similar examples. (This is why generative AI is different from a classification model, which is like a sorting machine.)

LLMs are trained on a large corpus of text, where some words are randomly replaced by blanks and the AI is tasked with filling in the blank. And while trying to learn to predict the next word in the text correctly, the AI also learns something about the process that created the text, which is the real world.

ChatGPT is so good because it uses more than 100 billion parameters.

What are some types of machine-learning?

Linear separability is a fairly simple algorithm in machine-learning. There are many algorithms that serve this purpose, and some of them are very complex.

There are three main ways in which ‘machines’ can be classified depending on the way they learn: supervised learning, unsupervised learning, and reinforcement learning.

In supervised learning, the data is labelled (e.g. in a table, the row and column titles are provided and datatypes — numbers, verbs, names, etc. — are pointed out). In unsupervised learning, this information is withheld, forcing the machine to understand how the data can be organised and then solve a problem. Similarly, in reinforcement learning, engineers score the machine’s output as it learns and solves problems on its own, and adjusts itself based on the scores.

The way in which information flows inside the machine is governed by artificial neural networks (ANNs), the software that ‘animates’ the hardware.

What is an artificial neural network?

An ANN comprises computing units, or nodes, connected together in such a way that the whole network learns the way an animal brain does. The nodes mimic neurons and the connections between nodes mimic synapses. Every ANN has two important components: activation functions and weights.

The activation function is an algorithm that runs at a node. Its job is to accept the inputs from other nodes to which it is connected and compute an output. The inputs and outputs are in the form of real numbers.

The weight refers to the ‘importance’ an activation function gives to a particular input. For example, say there are different nodes to estimate the fur colour, tail length, and dental profile in a given photo of a cat or a dog. All these nodes provide their outputs as inputs to a node responsible for separating ‘cat’ from ‘dog’. This way, the nodes can be ‘taught’ to adjust their outcomes by adjusting the relative weights they assign to different inputs.

While nodes are computing units, the ANN itself is not a physical entity. It is mathematical. A node is the ‘site’ of a mathematical function. Put another way, the ANN is like an algorithm that passes information from one activation function to the next in a specific order. The functions modify the information they receive in different ways.

What are transformers?

Transformers are a specialised type of ANN. They are easy to train in parallel, unlike the ANN architectures that preceded it. This is how, for example, ChatGPT could be trained on the entire web.

Here, the ANN is broken up into two parts: the encoder and the decoder. Say an ANN is required to recognise the presence of a cat in a photograph. The encoder accepts the photograph, breaks it up into small pieces (say, 10 x 10 pixels), and encodes the visual information as numerical data (e.g. 0s and 1s). The decoder accepts this data and processes the numbers to reconstruct the information content in the photograph.

The transformer architecture, originally developed at Google and released in 2017, is designed to maximise the amount of attention an ANN devotes to different parts of the input data. It has better performance as a result.

The advent of transformers revolutionised machines’ ability to translate long, complicated sentences.

What are GPUs?

The GPU is the physical processor that ‘runs’ the ANN. It was originally developed to render graphics for video games. It was better at this task than other processors at the time because it was designed to run computing tasks in parallel. It has been widely adopted since as the basic computing unit for ANNs for the same feature.

The company Nvidia has emerged as a technology giant since AI started becoming more popular because of its production of GPUs. Nvidia’s valuation was the fastest in history to go from $1 trillion to $2 trillion (in nine months). Every other company that has been building large AI models is using Nvidia’s GPU-based chips to do so.

In a 2023 analysis, financial services provider Seeking Alpha wrote Nvidia’s overwhelming market share has stoked “resistance” in three ways: competitors are trying to develop and switch to non-GPU hardware; researchers are building smaller learning models (with smaller ANNs) that require less resources than a top-shelf Nvidia chip to run; and developers are building new software to sidestep dependency on specific hardware.

The author is grateful to Viraj Kulkarni for inputs.