Last chance! 7 days left!

Click for a free read!

[Kubernetes Data Platform][Part 3][Main Components]: Install Distributed MinIO Cluster

Viet_1846

7 min readAug 4, 2024

MinIO is a high-performance object storage system released under the GNU Affero General Public License v3.0. It is API-compatible with the Amazon S3 cloud storage service. In simpler terms, MinIO is an open-source version of Amazon S3, so any technology that integrates with S3 can also integrate with MinIO. We use MinIO to replace HDFS.

In previous sections, we deployed MinIO on Kubernetes to test the functionality of the k8s cluster. However, this method is only suitable for testing and cannot be applied to the production environment. In this article, I will present how to deploy a MinIO cluster with multiple nodes and each node will have multiple disks attached.

Deployment model

  • There are 4 MinIO nodes, each with 4 disks attached, and one node acting as a load balancer.
MinIO Multiple nodes & Disks Cluster

DEPLOYMENT STEPS

1. Provision virtual machines using Vagrant and VirtualBox:

  • Including 4 servers for MinIO and 1 load balancer server
  • loadbalancer (IP 172.16.16.100)
  • minio1 — minio4 (IP 172.16.16.101–172.16.16.104)
  • Modify line 40 of the Vagrantfile to change the storage location of the disks attached to the virtual machine. (Details can be found in the Vagrantfile)

2. Create mount points and mount the corresponding disks to the mount points.

3. Copy the docker-compose file corresponding to each node and start the minio service on each node.

4. Configure HAProxy as a load balancer for the cluster.

5. Destroy The Cluster

Advantages of this deployment method:

  • High availability: The cluster can tolerate the failure of one or more nodes without affecting data availability.
  • Scalability: The cluster can be easily scaled by adding more nodes.
  • Performance: The use of multiple disks can improve the performance of the cluster.

HANDS-ON STEP

Reference Repository: https://github.com/viethqb/data-platform-notes/tree/main/minio/VM

IMPORTANT: Convention for Bash Scripts

Throughout this series, the following convention applies to Bash scripts:

Commands in the form “> command” should be executed on the local laptop.

Commands in the form “root@kworker1 command” should be executed on the kworker1 VM/Container as the root user.

1. Provision virtual machines using Vagrant and VirtualBox

Vagrantfile

# -*- mode: ruby -*-
# vi: set ft=ruby :

ENV['VAGRANT_NO_PARALLEL'] = 'yes'

Vagrant.configure(2) do |config|

config.vm.provision "shell", path: "bootstrap.sh"

# Load Balancer Node
config.vm.define "loadbalancer" do |lb|
lb.vm.box = "generic/ubuntu2204"
lb.vm.hostname = "loadbalancer"
lb.vm.network "private_network", ip: "172.16.16.100"
lb.vm.provider "virtualbox" do |v|
v.name = "loadbalancer"
v.memory = 2048
v.cpus = 1
end
end

# Number of Minio VMs
MinioCount = 4

# Configure Minio VMs
(1..MinioCount).each do |i|
config.vm.define "minio#{i}" do |minionode|
minionode.vm.box = "generic/ubuntu2204"
minionode.vm.hostname = "minio#{i}"
minionode.vm.network "private_network", ip: "172.16.16.10#{i}"

minionode.vm.provider "virtualbox" do |v|
v.name = "minio#{i}"
v.memory = 2048
v.cpus = 2

# Mount four disks for each VM
(1..4).each do |disk_number|
# Create disks using dedicated methods for specific storage controllers
disk = "/data/virtualbox/vms/minio#{i}/disk#{disk_number}.vdi"
unless File.exist? (disk)
v.customize ['createhd', '--filename', disk, '--size', 20 * 1024]
end
v.customize ['storageattach', :id, '--storagectl', 'SATA Controller', '--port', disk_number, '--device', 0, '--type', 'hdd', '--medium', disk]

end
end
end
end
end

Start virtual machines

> cd  ~/Documents 
> git clone https://github.com/viethqb/data-platform-notes.git
> cd data-platform-notes/minio/VM
> vagrant up

# vboxmanage list hdds
# vboxmanage closemedium <UUID> --delete
> ssh-copy-id [email protected]
> ssh-copy-id [email protected]
> ssh-copy-id [email protected]
> ssh-copy-id [email protected]
> ssh-copy-id [email protected]

2. Create mount points and mount the corresponding disks to the mount points

Run on minio1

  • lsblk lists information about all available or the specified block devices.
root@minio1:~# lsblk

=> sdb, sdc, sdd, sde are the storage disk for minio1.

=> mount sdb, sdc, sdd, sde to /data/minio/data-1, /data/minio/data-2, /data/minio/data-3, /data/minio/data-4

# Create mount point
root@minio1:~# mkdir -p /data/minio/data-1
root@minio1:~# mkdir -p /data/minio/data-2
root@minio1:~# mkdir -p /data/minio/data-3
root@minio1:~# mkdir -p /data/minio/data-4
root@minio1:~# mkfs.xfs /dev/sdb
root@minio1:~# mkfs.xfs /dev/sdc
root@minio1:~# mkfs.xfs /dev/sdd
root@minio1:~# mkfs.xfs /dev/sde

# mount disk to mountpoints
root@minio1:~# echo "/dev/sdb /data/minio/data-1 xfs defaults,noatime,nofail 0 0" >> /etc/fstab
root@minio1:~# echo "/dev/sdc /data/minio/data-2 xfs defaults,noatime,nofail 0 0" >> /etc/fstab
root@minio1:~# echo "/dev/sdd /data/minio/data-3 xfs defaults,noatime,nofail 0 0" >> /etc/fstab
root@minio1:~# echo "/dev/sde /data/minio/data-4 xfs defaults,noatime,nofail 0 0" >> /etc/fstab
root@minio1:~# mount -a
root@minio1:~# lsblk

Repeat with minio 2, 3, 4

3. Start the minio service

docker-compose.minio1.yaml

services:
minio1:
hostname: minio1
container_name: minio1
image: quay.io/minio/minio:RELEASE.2024-03-30T09-41-56Z
command: server --console-address ":9001" http://minio{1...4}/data-{1...4}
ports:
- "9000:9000"
- "9001:9001"
extra_hosts:
- "minio1:172.16.16.101"
- "minio2:172.16.16.102"
- "minio3:172.16.16.103"
- "minio4:172.16.16.104"
environment:
MINIO_ROOT_USER: admin
MINIO_ROOT_PASSWORD: password
volumes:
- /data/minio/data-1:/data-1
- /data/minio/data-2:/data-2
- /data/minio/data-3:/data-3
- /data/minio/data-4:/data-4
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:9000/minio/health/live"]
interval: 30s
timeout: 20s
retries: 3

docker-compose.minio2.yaml

services:
minio2:
hostname: minio2
container_name: minio2
image: quay.io/minio/minio:RELEASE.2024-03-30T09-41-56Z
command: server --console-address ":9001" http://minio{1...4}/data-{1...4}
ports:
- "9000:9000"
- "9001:9001"
extra_hosts:
- "minio1:172.16.16.101"
- "minio2:172.16.16.102"
- "minio3:172.16.16.103"
- "minio4:172.16.16.104"
environment:
MINIO_ROOT_USER: admin
MINIO_ROOT_PASSWORD: password
volumes:
- /data/minio/data-1:/data-1
- /data/minio/data-2:/data-2
- /data/minio/data-3:/data-3
- /data/minio/data-4:/data-4
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:9000/minio/health/live"]
interval: 30s
timeout: 20s
retries: 3

docker-compose.minio3.yaml

services:
minio3:
hostname: minio3
container_name: minio3
image: quay.io/minio/minio:RELEASE.2024-03-30T09-41-56Z
command: server --console-address ":9001" http://minio{1...4}/data-{1...4}
ports:
- "9000:9000"
- "9001:9001"
extra_hosts:
- "minio1:172.16.16.101"
- "minio2:172.16.16.102"
- "minio3:172.16.16.103"
- "minio4:172.16.16.104"
environment:
MINIO_ROOT_USER: admin
MINIO_ROOT_PASSWORD: password
volumes:
- /data/minio/data-1:/data-1
- /data/minio/data-2:/data-2
- /data/minio/data-3:/data-3
- /data/minio/data-4:/data-4
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:9000/minio/health/live"]
interval: 30s
timeout: 20s
retries: 3

docker-compose.minio4.yaml

services:
minio4:
hostname: minio4
container_name: minio4
image: quay.io/minio/minio:RELEASE.2024-03-30T09-41-56Z
command: server --console-address ":9001" http://minio{1...4}/data-{1...4}
ports:
- "9000:9000"
- "9001:9001"
extra_hosts:
- "minio1:172.16.16.101"
- "minio2:172.16.16.102"
- "minio3:172.16.16.103"
- "minio4:172.16.16.104"
environment:
MINIO_ROOT_USER: admin
MINIO_ROOT_PASSWORD: password
volumes:
- /data/minio/data-1:/data-1
- /data/minio/data-2:/data-2
- /data/minio/data-3:/data-3
- /data/minio/data-4:/data-4
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:9000/minio/health/live"]
interval: 30s
timeout: 20s
retries: 3

Copy the docker-compose file corresponding to each node

> scp docker-compose.minio1.yaml [email protected]:~
> scp docker-compose.minio2.yaml [email protected]:~
> scp docker-compose.minio3.yaml [email protected]:~
> scp docker-compose.minio4.yaml [email protected]:~

Install Docker in all minio node

root@minio1:~# curl -fsSL https://get.docker.com -o get-docker.sh
root@minio1:~# sh get-docker.sh

root@minio2:~# curl -fsSL https://get.docker.com -o get-docker.sh
root@minio2:~# sh get-docker.sh

root@minio3:~# curl -fsSL https://get.docker.com -o get-docker.sh
root@minio3:~# sh get-docker.sh

root@minio4:~# curl -fsSL https://get.docker.com -o get-docker.sh
root@minio4:~# sh get-docker.sh

Start the minio service

root@minio1:~# docker compose -f docker-compose.minio1.yaml up -d
root@minio1:~# docker logs -f minio1

root@minio2:~# docker compose -f docker-compose.minio2.yaml up -d
root@minio2:~# docker logs -f minio2

root@minio3:~# docker compose -f docker-compose.minio3.yaml up -d
root@minio3:~# docker logs -f minio3

root@minio4:~# docker compose -f docker-compose.minio4.yaml up -d
root@minio4:~# docker logs -f minio4
Start the minio service

4. Configure HAProxy as a load balancer for the cluster

root@loadbalancer:~# apt-get update -qq && apt-get install -y haproxy -qq
root@loadbalancer:~# cat >>/etc/haproxy/haproxy.cfg<<EOF

frontend minio-frontend
bind *:9000
mode tcp
option tcplog
default_backend minio-backend

backend minio-backend
mode tcp
option tcp-check
balance roundrobin
server minio-1 172.16.16.101:9000 check fall 3 rise 2
server minio-2 172.16.16.102:9000 check fall 3 rise 2
server minio-3 172.16.16.103:9000 check fall 3 rise 2
server minio-4 172.16.16.104:9000 check fall 3 rise 2

frontend minio-console-frontend
bind *:9001
mode tcp
option tcplog
default_backend minio-console-backend

backend minio-console-backend
mode tcp
option tcp-check
balance roundrobin
server minio-1 172.16.16.101:9001 check fall 3 rise 2
server minio-2 172.16.16.102:9001 check fall 3 rise 2
server minio-3 172.16.16.103:9001 check fall 3 rise 2
server minio-4 172.16.16.104:9001 check fall 3 rise 2
EOF
root@loadbalancer:~# systemctl enable haproxy.service
root@loadbalancer:~# systemctl restart haproxy.service

Access Minio at http://172.16.16.100:9001/ in your web browser with user: admin & password: password

5. Destroy The Cluster

> vagrant destroy -f

Conclusion

In this article, I’ve outlined the deployment of a multi-node, multi-disk MinIO cluster. We’ve explored the benefits of this approach including high availability, scalability, and performance improvements.

To enhance your MinIO cluster even further, here are some additional configurations we’ll delve into in a future article:

  • HTTPS Configuration: Establishing secure communication between clients and the MinIO cluster using HTTPS is crucial. We’ll explore how to configure HTTPS for your MinIO deployment.
  • Monitoring with Prometheus: Proactive monitoring is essential for maintaining a healthy MinIO cluster. We’ll discuss how to integrate Prometheus to monitor key metrics and ensure optimal cluster performance.

By incorporating these configurations, you can strengthen the security and observability of your MinIO deployment.

In the next article, we will deploy Hive Metastore, Trino, and configure Trino to read and write Apache Iceberg table formats. This is a very important part of a data platform.