In a Hadoop cluster find how to contribute limited/specific amount of storage as slave to the cluster

Kaushik Denge

2 min readDec 9, 2023

Introduction

Briefly introduce the concept of Hadoop and the need for managing storage resources efficiently in a Hadoop cluster.

Let’s proceed with a hypothetical scenario. For the sake of this example:

Cluster Details:

Assume you have a small Hadoop cluster with three nodes, named node1, node2, and node3.
Each node is running on a Linux-based operating system.

2. Storage Contribution:

You want to contribute 100 GB of storage from each slave node to the Hadoop cluster.

3. Linux Partitioning:

We will create a dedicated partition on each slave node’s Linux filesystem to allocate the specified storage for Hadoop data.

Now, let’s go step by step:

Step 1: SSH into Each Slave Node

Use SSH to connect to each slave node:

ssh username@node1
ssh username@node2
ssh username@node3

Step 2: Identify Available Storage

Check the current disk space on each node:

df -h

Identify a disk or partition with sufficient space for the Hadoop contribution.

Step 3: Create a Dedicated Partition

Assuming you have identified the /dev/sdb disk as available, create a new partition:

sudo fdisk /dev/sdb

Follow the prompts to create a new primary partition. Once done, save and exit.

Step 4: Format the New Partition

Format the newly created partition:

sudo mkfs.ext4 /dev/sdb1

Step 5: Mount the Partition

Create a mount point and mount the partition:

sudo mkdir /hadoop_data
sudo mount /dev/sdb1 /hadoop_data

Step 6: Update /etc/fstab for Permanent Mount

To ensure the partition is mounted on system startup, add an entry to /etc/fstab:

echo "/dev/sdb1 /hadoop_data ext4 defaults 0 0" | sudo tee -a /etc/fstab

Step 7: Verify the Mount

Verify that the partition is mounted correctly:

df -h

Step 8: Configure Hadoop to Use the New Storage

Update your Hadoop configuration files (e.g., hdfs-site.xml) to include the new directory (/hadoop_data) for Hadoop data storage.

Step 9: Restart Hadoop Services

Restart the Hadoop services to apply the changes:

# Assuming you are using Apache Hadoop
sudo service hadoop-hdfs-datanode restart

Repeat these steps on each slave node.

This example assumes a basic scenario and configuration. Depending on your actual Hadoop distribution and cluster setup, the steps might vary. Adjust the steps accordingly based on your specific environment.

Thank you for reading!