2  AI for Everyone: An Introductory Overview

Goals

This session aims to introduce AI to a non-specialist audience, ensuring that participants from any background can understand these essential concepts. The focus will be on explaining key terminology and the basic principles of machine learning and deep learning. By the end of this session, participants will have a solid foundational knowledge of key AI concepts, enabling them to better appreciate and engage with more advanced topics in the following sessions.

2.1 The Foundations of AI

Before reading the definition, take a moment: What do you think AI is? How would you define it?

What is AI?

Artificial Intelligence (AI) refers to the development of computer systems capable of performing tasks that typically require cognitive functions associated with human intelligence, such as recognizing patterns, learning from data, and making predictions.

But… there is a minor issue with this definition. What exactly is human intelligence?

Recognizing patterns, learning, and making predictions are all functions of intelligence, but what lies at the core of a “conscious human”? Why is self-awareness important in cognition, and what evolutionary function does subjective, conscious experience serve?

In the philosophy of mind, this phenomenon is referred to as qualia, and there is still no definitive scientific answer to why qualia exist—at least, not yet (see theories of consciousness for more information).

But today, let’s focus on a simpler question. How do humans think?

Historically, before the 1950s–1960s, scientists believed humans think through a series of if/else statements (e.g., “If I drink more coffee, I’ll be jittery,” or “If a seagull spots my pizza, it’ll try to snatch a bite”). Geoffrey Hinton, a cognitive psychologist and computer scientist, was one of the advocates for an opposing idea: that humans think more experientially or probabilistically. For instance, based on the cloud cover today and similar past experiences, there’s a high probability of rain, so I’ll grab an umbrella. This idea laid the foundation for probabilistic algorithms and, ultimately, the field of Machine Learning.

2.2 Machine Learning

To quickly recap, AI is a broad term encompassing efforts to replicate aspects of human cognition. Machine Learning is a subset of AI that focuses on algorithms enabling computers to learn from data and build probabilistic models.

What is ML?

Machine Learning (ML) is a subset of AI that specifically focuses on algorithms that allow computers to learn from data and create probabilistic models.

Source: Original comic by sandserif

Source: Original comic by sandserif

Machine Learning includes various types and techniques, but in this workshop we’ll primarily focus on Neural Networks (NNs).

2.3 Neural Networks

NNs are loosely inspired by the structure of the human brain and consist of interconnected nodes, or neurons, that process information. The principle “neurons that fire together, wire together” [1] captures the idea that the strength of their connections, known as weights, adjusts based on experience.

Figure 2.1: Source [2] The parts of neuron: a cell body with a nucleus, branching dendrites, and a long axon connecting with thousands of other neurons at synapses.

Figure 2.1: Source [2] The parts of neuron: a cell body with a nucleus, branching dendrites, and a long axon connecting with thousands of other neurons at synapses.

Figure 2.2: Structure of a neural network: Ramón y Cajal’s drawing of the cells of the chick cerebellum, from Estructura de los centros nerviosos de las aves, Madrid, 1905

Figure 2.2: Structure of a neural network: Ramón y Cajal’s drawing of the cells of the chick cerebellum, from Estructura de los centros nerviosos de las aves, Madrid, 1905
What is NN?

Neural Network (NN) is a foundational technique within the field of machine learning. NNs are designed to simulate the way the human brain processes information by using a series of connected layers, or neurons, that transform and interpret input data.

The Perceptron [3], one of the earliest neural network models, was invented in 1957 by psychologist Frank Rosenblatt, who unfortunately did not live long enough to witness the far-reaching impact of his work. Rosenblatt’s Perceptron was a physical machine with retina-like sensors as inputs, wires acting as the hidden layers, and a binary output system. This invention marked the early stages of artificial intelligence, laying the groundwork for the powerful neural networks we use today.

2.4 Exercise: NN Playground

Web-based app, no setup or account required: playground.tensorflow.org

.

.

Level 1: Browse around

Orange indicates negative values, while blue represents positive values. Typically, an 80/20 split for training and testing data is used. Smaller datasets may need a 90% training portion for more examples, while larger datasets can reduce training data to increase test samples. Background colors illustrate the network’s predictions, with more intense colors representing higher confidence in its prediction. Adding noise during training helps the model generalize by recognizing true patterns, enhancing robustness and stability with real-world noisy data.

Level 2: Things to try

Hidden Layers

Hidden Layers are the layers that are neither input nor output. You can think of the values computed at each layer of the network as a different representation for the input X. Each layer transforms the representation produced by the preceding layer to produce a new representation.

Start with one hidden layer and one or two neurons, observing predictions (orange vs. blue background) against actual data points (orange vs. blue dots). With minimal layers and neurons, predictions are often inaccurate. Increasing hidden layers and neurons improves alignment with the actual data, illustrating how added complexity helps the model learn and approximate complex patterns more accurately.

Level 3: More things to try!

Weights

Weights are parameters within the neural network that transform input data as it passes through layers. They determine the strength of connections between neurons, with each weight adjusting how much influence one neuron has on another. During training, the network adjusts these weights to reduce errors in predictions.

As you press the play button, you can see the number of epochs increase. In an Artificial Neural Network, an epoch represents one complete pass through the training dataset.

Learning rate

The learning rate is a key setting or hyperparameter that controls how much a model adjusts its weights during training. A higher rate speeds up learning but risks overshooting the optimal solution, while a lower rate makes learning more precise but slower. It’s one of the most crucial settings when building a neural network.

2.5 Backpropagation

Initially, neural networks were quite shallow feed-forward networks. Adding more hidden layers made training them difficult. However, in the 1980s—often referred to as the rebirth of AI—the invention of the backpropagation algorithm revolutionized the field. It allowed for efficient error correction and gradient calculation across layers, making it possible to train much deeper networks than before.

What is backpropagation?

Backpropagation is an algorithm that calculates the error at the output layer of a neural network and then “back propagates” this error through the network, layer by layer. It updates the connections (weights) between neurons to reduce the error, allowing the model to improve its accuracy during training.

Figure 2.3: Backpropagation

Figure 2.3: Backpropagation

Figure 2.4: Source (3Blue1Brown)

Figure 2.4: Source (3Blue1Brown)

Thus, the backpropagation algorithm enabled the training of neural networks with multiple layers, laying the foundation for the field of deep learning.

2.6 Deep Learning

Deep Learning

Deep Learning (DL) is a subset of ML, that uses multilayered neural networks, called deep neural networks.

Figure 2.5: Source: Artificial Intelligence - A modern approach. [2]

Figure 2.5: Source: Artificial Intelligence - A modern approach. [2]

Figure 2.7: (a) A shallow model, such as linear regression, has short computation paths between inputs and output. (b) A decision list network has some long paths for some possible input values, but most paths are short. (c) A deep learning network has longer computation paths, allowing each variable to interact with all the others.

Deep learning (DL) techniques are typically classified into three categories: supervised, semi-supervised, and unsupervised. Additionally, reinforcement learning (RL) is often considered a partially supervised technique, sometimes overlapping with unsupervised methods.

Supervised Learning involves learning from labeled data, where models directly learn from input-output pairs. Common examples include Recurrent Neural Networks (RNNs), Convolutional Neural Networks (CNNs), and Transformers. These models are generally simpler in terms of training and achieve high performance.

Semi-Supervised Learning combines a small amount of labeled data with a large amount of unlabeled data, often using auto-labeling techniques. Examples include Self-training models, where a model iteratively labels data to improve, and Graph Neural Networks (GNNs), which are useful for understanding relationships between data points.

Unsupervised Learning relies on unlabeled data, focusing on identifying patterns or structures. Popular models include Autoencoders, Generative Adversarial Networks (GANs), and Restricted Boltzmann Machines (RBMs).

Figure 2.6: Organogram of AI algorithms.

Figure 2.6: Organogram of AI algorithms.

Despite advances in backpropagation, deep learning, computing power, and optimization, neural networks still face the problem known as catastrophic forgetting — losing old knowledge when trained on new tasks. Current AI models are often “frozen” and specialized, needing complete retraining for updates, unlike even simple animals that can continuously learn without forgetting [4]. This limitation is one of the reason that led to the development of specialized deep learning models, each with unique architectures tailored to specific tasks. Let’s explore how each of these models can be applied in scientific research!

Convolutional Neural Networks (CNN) are artificial neural networks designed to process structured data like images. Originally inspired by the mammalian visual cortex, CNNs attempt to mimic how our brains process visual information in layers. First, brains interpret basic shapes like lines and edges and then move to more complex structures, like recognizing a dog ear or the whole animal. This feature, known as “invariance” allows us to recognize objects even if they’re rotated or appear in a different part of our vision.

CNNs simulate this by using small, repeated filters, or kernels, that scan parts of an image to find basic shapes, edges, and textures, regardless of their location. This scanning process, called convolution, enables early CNN layers to detect simple patterns (like lines) and deeper layers to identify more complex shapes or objects.

Application: CNNs are highly effective for image-related tasks, making them ideal for analyzing satellite or drone imagery in ecology, identifying structures in biomedical imaging, and classifying galaxies in astrophysics.

Figure 2.7: Source (Bennett, 2023) [4] Convolutional Neural Networks

Figure 2.7: Source (Bennett, 2023) [4] Convolutional Neural Networks

Limitations of CNNs

Ironically, though CNNs were inspired by the mammalian visual system, they struggle with tasks that even simpler animals like fish handle easily. CNNs have trouble with rotated or differently angled objects. Current work around it is to have variations of object images in training data with all kinds of different angles [4].

While CNNs follow a layered structure, recent research reveals that the brain’s visual processing is more flexible and not as hierarchical as once believed. Our visual system can “skip” layers or process information in parallel, allowing simultaneous handling of different types of visual input across various brain regions.

Recurrent Neural Networks (RNN) are artificial neural networks designed to process sequential data. By incorporating cycles in their computation graph, RNNs can “remember” previous inputs, making them especially useful for tasks where context is important.

Application: These models are commonly used for time series data, such as weather forecasting, monitoring ecological changes over time, and analyzing temporal patterns in genomic data.

Figure 2.8: Recurrent Neural Network. Source: dataaspirant.com

Figure 2.8: Recurrent Neural Network. Source: dataaspirant.com

Reinforcement Learning (RL) is a learning technique in which an agent interacts with the environment and periodically receives rewards (reinforcements) or penalties to achieve a goal.

With supervised learning, an agent learns by passively observing example input/output pairs provided by a “teacher.” Reinforcement Learning is one of the attempts to solve the catastrophic forgetting problem and introduce AI agents that can actively learn from their own experience in a given environment.[2]

Figure 2.8: Reinforcement Learning

Figure 2.8: Reinforcement Learning

Applications: RL is applied in robotics and can also assist with experiment simulation in science, environmental monitoring, autonomous driving, and creating AI opponents in gaming.

One example of RL is the Actor-Critic model, which divides the learning process into two roles: the Actor, who explores the environment and makes decisions, and the Critic, who evaluates these actions. The Critic provides feedback on the quality of each action, helping the Actor balance exploration (trying new actions) with exploitation (choosing actions with known rewards). Recent research has explored various algorithms to model curiosity in artificial agents [5] [6].

Large Language Models (LLM) are a type of neural network that has revolutionized natural language processing (NLP). Trained on massive datasets, these models can generate human-like text, translate languages, create various forms of content, and answer questions informatively (e.g., GPT-3, Gemini, Llama).

Figure 2.9: Large Language Models. Source: Artificial Intelligence - A modern approach. [2]

Figure 2.9: Large Language Models. Source: Artificial Intelligence - A modern approach. [2]

Transformers, introduced in 2017, revolutionized NLP by introducing a mechanism called self-attention. Unlike previous models like Recurrent Neural Networks (RNNs) that processed language sequentially, Transformers use self-attention to assess relationships between all words in a sentence simultaneously. This allows them to dynamically focus on different parts of the input text and weigh the importance of each word in relation to others. This allows them to understand context and meaning with much better accuracy.

Figure 2.10: Evolution of Natural Languge Processing. Source[2]

Figure 2.10: Evolution of Natural Languge Processing. Source[2]

The success of LLMs has driven AI’s recent surge in popularity and research. Between 2010 and 2022, the volume of AI-related publications nearly tripled, climbing from about 88,000 in 2010 to over 240,000 by 2022. Likewise, AI patent filings have skyrocketed, increasing from roughly 3,500 in 2010 to over 190,000 in 2022. In the first half of 2024 alone, AI and machine learning companies in the United States attracted $38.6 billion in investment out of a total of $93.4 billion. [7]

2.7 AI Beyond Machine Learning

Within the field of AI, there are many techniques that don’t rely on ML principles.

Technique Description
If/Else or Rule-Based Systems Collections of predefined rules or conditions (if statements) to make decisions.
Symbolic AI (Logic-Based AI) Logical rules and symbols to represent knowledge, focusing on reasoning through deductive logic.
Genetic Algorithms (Evolutionary Algorithms) Optimization algorithms inspired by natural selection.
Fuzzy Logic A form of logic that works with “degrees of truth”, making it useful for uncertain or ambiguous scenarios.
Knowledge Representation and Reasoning (KR&R) Techniques for structuring and processing information, often using ontologies and semantic networks.
Bayesian Networks Probabilistic graphical models that represent relationships between variables.

Recent research increasingly combines various AI paradigms, such as symbolic AI and Knowledge Representation and Reasoning (KR&R), with Machine Learning (ML) to achieve a higher level of effectiveness tailored to specific tasks.

2.8 The Future of AI in Science

AI is transforming the scientific method by supporting each step of scientific discovery. Let’s consider how various AI techniques can be applied at each stage of the scientific process:

  • Observation: Using computer vision for data collection.
  • Hypothesis: Clustering data with unsupervised learning.
  • Experiment: Simulating environments through reinforcement learning.
  • Data Analysis: Simplifying data with PCA and classifying insights using neural networks or SVMs.
  • Conclusion: Combining LLMs with KR&R to generate complex findings and insights.

In conclusion, regardless of model type, high-quality data is essential for accurate AI predictions and insights. In the next sessions, we’ll explore practical tips for working with well-prepared, high-quality data.

Do You Have Any Questions?

Feel free to reach out!

Email: alyonak@nceas.ucsb.edu

Website: alonakosobokova.com

YouTube: Dork Matter Girl