Introduction to Kubernetes – Dataquest

Up until now you’ve learned about Docker containers and how they solve the “works on my machine” problem. But once your projects involve multiple containers running 24/7, new challenges appear, ones Docker alone doesn’t solve.

In this tutorial, you’ll discover why Kubernetes exists and get hands-on experience with its core concepts. We’ll start by understanding a common problem that developers face, then see how Kubernetes solves it.

By the end of this tutorial, you’ll be able to:

Explain what problems Kubernetes solves and why it exists
Understand the core components: clusters, nodes, pods, and deployments
Set up a local Kubernetes environment
Deploy a simple application and see self-healing in action
Know when you might choose Kubernetes over Docker alone

Why Does Kubernetes Exist?

Let’s imagine a realistic scenario that shows why you might need more than just Docker.

You’re building a data pipeline with two main components:

A PostgreSQL database that stores your processed data
A Python ETL script that runs every hour to process new data

Using Docker, you’ve containerized both components and they work perfectly on your laptop. But now you need to deploy this to a production server where it needs to run reliably 24/7.

Here’s where things get tricky:

What happens if your ETL container crashes? With Docker alone, it just stays crashed until someone manually restarts it. You could configure VM-level monitoring and auto-restart scripts, but now you’re building container management infrastructure yourself.

What if the server fails? You’d need to recreate everything on a new server. Again, you could write scripts to automate this, but you’re essentially rebuilding what container orchestration platforms already provide.

The core issue is that you end up writing custom infrastructure code to handle container failures, scaling, and deployments across multiple machines.

This works fine for simple deployments, but becomes complex when you need:

Application-level health checks and recovery
Coordinated deployments across multiple services
Dynamic scaling based on actual workload metrics

How Kubernetes Helps

Before we get into how Kubernetes helps, it’s important to understand that it doesn’t replace Docker. You still use Docker to build container images. What Kubernetes adds is a way to run, manage, and scale those containers automatically in production.

Kubernetes acts like an intelligent supervisor for your containers. Instead of telling Docker exactly what to do (“run this container”), you tell Kubernetes what you want the end result to look like (“always keep my ETL pipeline running”), and it figures out how to make that happen.

If your ETL container crashes, Kubernetes automatically starts a new one. If the entire server fails, Kubernetes can move your containers to a different server. If you need to handle more data, Kubernetes can run multiple copies of your ETL script in parallel.

The key difference is that Kubernetes shifts you from manual container management to automated container management.

The tradeoff? Kubernetes adds complexity, so for single-machine projects Docker Compose is often simpler. But for systems that need to run reliably over time and scale, the complexity is worth it.

How Kubernetes Thinks

To use Kubernetes effectively, you need to understand how it approaches container management differently than Docker.

When you use Docker directly, you think in imperative terms, meaning that you give specific commands about exactly what should happen:

docker run -d --name my-database postgres:13
docker run -d --name my-etl-script python:3.9 my-script.py

You’re telling Docker exactly which containers to start, where to start them, and what to call them.

Kubernetes, on the other hand, uses a declarative approach. This means you describe what you want the final state to look like, and Kubernetes figures out how to achieve and maintain that state. For example: “I want a PostgreSQL database to always be running” or “I want my ETL script to run reliably.”

This shift from “do this specific thing” to “maintain this desired state” is fundamental to how Kubernetes operates.

Here’s how Kubernetes maintains your desired state:

You declare what you want using configuration files or commands
Kubernetes stores your desired state in its database
Controllers continuously monitor the actual state vs. desired state
When they differ, Kubernetes takes action to fix the discrepancy
This process repeats every few seconds, forever

This means that if something breaks your containers, Kubernetes will automatically detect the problem and fix it without you having to intervene.

Core Building Blocks

Kubernetes organizes everything using several key concepts. We’ll discuss the foundational building blocks here, and address more nuanced and complex concepts in a later tutorial.

Cluster

A cluster is a group of machines that work together as a single system. Think of it as your pool of computing resources that Kubernetes can use to run your applications. The important thing to understand is that you don’t usually care which specific machine runs your application. Kubernetes handles the placement automatically based on available resources.

Nodes

Nodes are the individual machines (physical or virtual) in your cluster where your containers actually run. You’ll mostly interact with the cluster as a whole rather than individual nodes, but it’s helpful to understand that your containers are ultimately running on these machines.

Note: We’ll cover the details of how nodes work in a later tutorial. For now, just think of them as the computing resources that make up your cluster.

Pods: Kubernetes’ Atomic Unit

Here’s where Kubernetes differs significantly from Docker. While Docker thinks in terms of individual containers, Kubernetes’ smallest deployable unit is called a Pod.

A Pod typically contains:

At least one container
Shared networking so containers in the Pod can communicate using localhost
Shared storage volumes that all containers in the Pod can access

Most of the time, you’ll have one container per Pod, but the Pod abstraction gives Kubernetes a consistent way to manage containers along with their networking and storage needs.

Pods are ephemeral, meaning they come and go. When a Pod fails or gets updated, Kubernetes replaces it with a new one. This is why you rarely work with individual Pods directly in production (we’ll cover how applications communicate with each other in a future tutorial).

Deployments: Managing Pod Lifecycles

Since Pods are ephemeral, you need a way to ensure your application keeps running even when individual Pods fail. This is where Deployments come in.

A Deployment is like a blueprint that tells Kubernetes:

What container image to use for your application
How many copies (replicas) you want running
How to handle updates when you deploy new versions

When you create a Deployment, Kubernetes automatically creates the specified number of Pods. If a Pod crashes or gets deleted, the Deployment immediately creates a replacement. If you want to update your application, the Deployment can perform a rolling update, replacing old Pods one at a time with new versions. This is the key to Kubernetes’ self-healing behavior: Deployments continuously monitor the actual number of running Pods and work to match your desired number.

Setting Up Your First Cluster

To understand how these concepts work in practice, you’ll need a Kubernetes cluster to experiment with. Let’s set up a local environment and deploy a simple application.

Prerequisites

Before we start, make sure you have Docker Desktop installed and running. Minikube uses Docker as its default driver to create the virtual environment for your Kubernetes cluster.

If you don’t have Docker Desktop yet, download it from docker.com and make sure it’s running before proceeding.

Install Minikube

Minikube creates a local Kubernetes cluster perfect for learning and development. Install it by following the official installation guide for your operating system.

You can verify the installation worked by checking the version:

minikube version

Start Your Cluster

Now you’re ready to start your local Kubernetes cluster:

minikube start

This command downloads a virtual machine image (if it’s your first time), starts the VM using Docker, and configures a Kubernetes cluster inside it. The process usually takes a few minutes.

You’ll see output like:

😄  minikube v1.33.1 on Darwin 14.1.2
✨  Using the docker driver based on existing profile
👍  Starting control plane node minikube in cluster minikube
🚜  Pulling base image ...
🔄  Restarting existing docker container for "minikube" ...
🐳  Preparing Kubernetes v1.28.3 on Docker 24.0.7 ...
🔎  Verifying Kubernetes components...
🌟  Enabled addons: storage-provisioner, default-storageclass
🏄  Done! kubectl is now configured to use "minikube" cluster and "default" namespace by default

Set Up kubectl Access

Now that your cluster is running, you can use kubectl to interact with it. We’ll use the version that comes with Minikube rather than installing it separately to ensure compatibility:

minikube kubectl -- version

You should see version information for both the client and server.

While you could type minikube kubectl -- before every command, the standard practice is to create an alias. This mimics how you’ll work with kubectl in cloud environments where you just type kubectl:

alias kubectl="minikube kubectl --"

Why use an alias? In production environments (AWS EKS, Google GKE, etc.), you’ll install kubectl separately and use it directly. By practicing with the kubectl command now, you’re building the right muscle memory. The alias lets you use standard kubectl syntax while ensuring you’re talking to your local Minikube cluster.

Add this alias to your shell profile (.bashrc, .zshrc, etc.) if you want it to persist across terminal sessions.

Verify Your Cluster

Let’s make sure everything is working:

kubectl cluster-info

You should see something like:

Kubernetes control plane is running at <https://192.168.49.2:8443>
CoreDNS is running at <https://192.168.49.2:8443/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy>

Now check what’s running in your cluster:

kubectl get nodes

You should see one node (your Minikube VM):

NAME       STATUS   ROLES           AGE   VERSION
minikube   Ready    control-plane   2m    v1.33.1

Perfect! You now have a working Kubernetes cluster.

Deploy Your First Application

Let’s deploy a PostgreSQL database to see Kubernetes in action. We’ll create a Deployment that runs a postgres container. We’ll use PostgreSQL because it’s a common component in data projects, but the steps are the same for any container.

Create the deployment:

kubectl create deployment hello-postgres --image=postgres:13
kubectl set env deployment/hello-postgres POSTGRES_PASSWORD=mysecretpassword

Check what Kubernetes created for you:

kubectl get deployments

You should see:

NAME             READY   UP-TO-DATE   AVAILABLE   AGE
hello-postgres   1/1     1            1           30s

Note: If you see 0/1 in the READY column, that’s normal! PostgreSQL needs the environment variable to start properly. The deployment will automatically restart the Pod once we set the password, and you should see it change to 1/1 within a minute.

Now look at the Pods:

kubectl get pods

You’ll see something like:

NAME                              READY   STATUS    RESTARTS   AGE
hello-postgres-7d8757c6d4-xyz123  1/1     Running   0          45s

Notice how Kubernetes automatically created a Pod with a generated name. The Deployment is managing this Pod for you.

Connect to Your Application

Your PostgreSQL database is running inside the cluster. There are two common ways to interact with it:

Option 1: Using kubectl exec (direct container access)

kubectl exec -it deployment/hello-postgres -- psql -U postgres

This connects you directly to a PostgreSQL session inside the container. The -it flags give you an interactive terminal. You can run SQL commands directly:

postgres=# SELECT version();
postgres=# \q

Option 2: Using port forwarding (local connection)

kubectl port-forward deployment/hello-postgres 5432:5432

Leave this running and open a new terminal. Now you can connect using any PostgreSQL client on your local machine as if the database were running locally on port 5432. Press Ctrl+C to stop the port forwarding when you’re done.

Both approaches work well. kubectl exec is faster for quick database tasks, while port forwarding lets you use familiar local tools. Choose whichever feels more natural to you.

Let’s break down what you just accomplished:

You created a Deployment – This told Kubernetes “I want PostgreSQL running”
Kubernetes created a Pod – The actual container running postgres
The Pod got scheduled to your Minikube node (the single machine in your cluster)
You connected to the database – Either directly with kubectl exec or through port forwarding

You didn’t have to worry about which node to use, how to start the container, or how to configure networking. Kubernetes handled all of that based on your simple deployment command.

Next, we’ll see the real magic: what happens when things go wrong.

The Magic Moment: Self-Healing

You’ve deployed your first application, but you haven’t seen Kubernetes’ most powerful feature yet. Let’s break something on purpose and watch Kubernetes automatically fix it.

Break Something on Purpose

First, let’s see what’s currently running:

kubectl get pods

You should see your PostgreSQL Pod running:

NAME                              READY   STATUS    RESTARTS   AGE
hello-postgres-7d8757c6d4-xyz123  1/1     Running   0          5m

Now, let’s “accidentally” delete this Pod. In a traditional Docker setup, this would mean your database is gone until someone manually restarts it:

kubectl delete pod hello-postgres-7d8757c6d4-xyz123

Replace hello-postgres-7d8757c6d4-xyz123 with your actual Pod name from the previous command.

You’ll see:

pod "hello-postgres-7d8757c6d4-xyz123" deleted

Watch the Magic Happen

Immediately check your Pods again:

kubectl get pods

You’ll likely see something like this:

NAME                              READY   STATUS    RESTARTS   AGE
hello-postgres-7d8757c6d4-abc789  1/1     Running   0          10s

Notice what happened:

The Pod name changed – Kubernetes created a completely new Pod
It’s already running – The replacement happened automatically
It happened immediately – No human intervention required

If you’re quick enough, you might catch the Pod in ContainerCreating status as Kubernetes spins up the replacement.

What Just Happened?

This is Kubernetes’ self-healing behavior in action. Here’s the step-by-step process:

You deleted the Pod – The container stopped running
The Deployment noticed – It continuously monitors the actual vs desired state
State mismatch detected – Desired: 1 Pod running, Actual: 0 Pods running
Deployment took action – It immediately created a new Pod to match the desired state
Balance restored – Back to 1 Pod running, as specified in the Deployment

This entire process took seconds and required no human intervention.

Test It Again

Let’s verify the database is working in the new Pod:

kubectl exec deployment/hello-postgres -- psql -U postgres -c "SELECT version();"

Perfect! The database is running normally. The new Pod automatically started with the same configuration (PostgreSQL 13, same password) because the Deployment specification didn’t change.

What This Means

This demonstrates Kubernetes’ core value: turning manual, error-prone operations into automated, reliable systems. In production, if a server fails at 3 AM, Kubernetes automatically restarts your application on a healthy server within seconds, much faster than alternatives that require VM startup time and manual recovery steps.

You experienced the fundamental shift from imperative to declarative management. You didn’t tell Kubernetes HOW to fix the problem – you only specified WHAT you wanted (“keep 1 PostgreSQL Pod running”), and Kubernetes figured out the rest.

Next, we’ll wrap up with essential tools and guidance for your continued Kubernetes journey.

Cleaning Up

When you’re finished experimenting, you can clean up the resources you created:

# Delete the PostgreSQL deployment
kubectl delete deployment hello-postgres

# Stop your Minikube cluster (optional - saves system resources)
minikube stop

# If you want to completely remove the cluster (optional)
minikube delete

The minikube stop command preserves your cluster for future use while freeing up system resources. Use minikube delete only if you want to start completely fresh next time.

Wrap Up and Next Steps

You’ve successfully set up a Kubernetes cluster, deployed an application, and witnessed self-healing in action. You now understand why Kubernetes exists and how it transforms container management from manual tasks into automated systems.

Now you’re ready to explore:

Services – How applications communicate within clusters
ConfigMaps and Secrets – Managing configuration and sensitive data
Persistent Volumes – Handling data that survives Pod restarts
Advanced cluster management – Multi-node clusters, node pools, and workload scheduling strategies
Security and access control – Understanding RBAC and IAM concepts

The official Kubernetes documentation is a great resource for diving deeper.

Remember the complexity trade-off: Kubernetes is powerful but adds operational overhead. Choose it when you need high availability, automatic scaling, or multi-server deployments. For simple applications running on a single machine, Docker Compose is often the better choice. Many teams start with Docker Compose and migrate to Kubernetes as their reliability and scaling requirements grow.

Now you have the foundation to make informed decisions about when and how to use Kubernetes in your data projects.

Source link

Education & Learning

Introduction to Kubernetes – Dataquest

Why Does Kubernetes Exist?

How Kubernetes Helps

How Kubernetes Thinks

Core Building Blocks

Cluster

Nodes

Pods: Kubernetes’ Atomic Unit

Deployments: Managing Pod Lifecycles

Setting Up Your First Cluster

Prerequisites

Install Minikube

Start Your Cluster

Set Up kubectl Access

Verify Your Cluster

Deploy Your First Application

Connect to Your Application

The Magic Moment: Self-Healing

Break Something on Purpose

Watch the Magic Happen

What Just Happened?

Test It Again

What This Means

Cleaning Up

Wrap Up and Next Steps

PowerApp Pricing Plans And Costs: A Complete Breakdown

Upskilling at RizePoint With Codecademy

Leave A Reply Cancel reply

Subscribe our Newsletter

Contact Us

The Asha Modern School

Links

Recommend

Education & Learning

Why Does Kubernetes Exist?

How Kubernetes Helps

How Kubernetes Thinks

Core Building Blocks

Cluster

Nodes

Pods: Kubernetes’ Atomic Unit

Deployments: Managing Pod Lifecycles

Setting Up Your First Cluster

Prerequisites

Install Minikube

Start Your Cluster

Set Up kubectl Access

Verify Your Cluster

Deploy Your First Application

Connect to Your Application

The Magic Moment: Self-Healing

Break Something on Purpose

Watch the Magic Happen

What Just Happened?

Test It Again

What This Means

Cleaning Up

Wrap Up and Next Steps

PowerApp Pricing Plans And Costs: A Complete Breakdown

Upskilling at RizePoint With Codecademy

You may also like

iSpring AI Days 2025 – eLearning Industry

Public-Private Collaboration: Is It The Future Of Education?

Finance Education: How eLearning Is Shaping The Future

Leave A Reply Cancel reply

Subscribe our Newsletter

Contact Us

The Asha Modern School

Links

Recommend

Login with your site account

Register a new account