Integrating LVM with Hadoop Cluster

Harshil Shah
4 min readMar 11, 2021

Here I will use LVM to provide elasticity to hadoop cluster.

What is LVM ?

Logical Volume Management (LVM) creates a layer of abstraction over physical storage, allowing you to create logical storage volumes. With LVM in place, you are not bothered with physical disk sizes because the hardware storage is hidden from the software so it can be resized and moved without stopping applications or unmounting file systems. You can think of LVM as dynamic partitions.

What is Hadoop?

Hadoop is an open source, a Java-based programming framework that supports the storage and processing of extremely large data sets in a distributed computing environment. It is part of the Apache project sponsored by the Apache Software Foundation.

First as you can see I have one datanode attached to master. And it is sharing a total of 46.9 GB of storage. Here I have used RedHat Linux VM.

hadoop dfsadmin -report

use above command to see how many slave nodes are connected to master and what volume they are sharing.

STEP: 1

Create hard disk from settings of you namenode VM.

Attached hard disk info can be seen with fdisk -l command

Step 2:

Convert the hard disk to Physical Volume(PV)

pvcreate command initialize these disks so that they can be a part in forming volume groups.

Step 3:

Create a volume group

Physical volumes are combined into volume groups (VGs). It creates a pool of disk space out of which logical volumes can be allocated.

Step 4:

Create a Logical Volume

As you can see I have created Logical volume of 5 GB for now.

Step 5:

Format the Logical Volume/partition. We have to provide entire path of logical volume ( In above figure i.e LV Path value)

Step 6:

mount the partition to the folder you used to share in hadoop cluster.

Now when you again check report using hadoop dfsadmin -report command, my new LVM is attached instead of original 46.9G volume. As shown below

Step 7:

To increase the size of the LV/partition use lvextend command. I am extending more 15 GB volume in my LVM only and it will increase storage capacity in my hadoop cluster automatically.

Step 8:

Now, reformat the new storage added to the LV using resize2fs command

Step 9:

Check the size of storage using df -h command

As you can see my lvm volume size has increased to 19G since I extended more 15G.

As you can notice my hadoop cluster storage capacity has also increased upto 19.62G. Before it was 4G

We finally dynamically increased hadoop cluster volume using LVM.

Thank you for reading it through !!

--

--