Deduplication is a software feature that is used to remove duplicate data blocks (redundant data blocks) from a filesystem to save disk spaces. The Btrfs filesystem is a modern Copy-on-Write (CoW) filesystem that supports deduplication.

If you need to keep a lot of redundant data (i.e., file backups, database) on your computer, then the Copy-on-Write (CoW) and deduplication feature of the Btrfs filesystem can save a huge amount of disk spaces.

In this article, I will show you how to save disk spaces using the Btrfs deduplication feature. So, let’s get started.

Prerequisites:

To try out the examples of this article,

  • You must have the Btrfs filesystem installed on your computer.
  • You need to have a hard disk or SSD with at least 1 free partition (of any size).

I have a 20 GB hard disk sdb on my Ubuntu machine. I have created 2 partitions sdb1 and sdb2, on this hard disk. I will use the partition sdb1 in this article.

<img alt="" data-lazy- data-lazy-src="https://kirelos.com/wp-content/uploads/2021/01/echo/sd1.png" data-lazy- height="287" src="data:image/svg xml,” width=”691″>

Your hard disk or SSD may have a different name than mine, so will the partitions. So, make sure to replace them with yours from now on.

If you need any assistance on installing the Btrfs filesystem on Ubuntu, check my article Install and Use Btrfs on Ubuntu 20.04 LTS.

If you need any assistance on installing the Btrfs filesystem on Fedora, check my article Install and Use Btrfs on Fedora 33.

Creating a Btrfs Filesystem:

To experiment with Btrfs filesystem-level data compression, you need to create a Btrfs filesystem.

To create a Btrfs filesystem with the label data on the sdb1 partition, run the following command:

$ sudo mkfs.btrfs -L data /dev/sdb1

<img alt="" data-lazy- data-lazy-src="https://kirelos.com/wp-content/uploads/2021/01/echo/sd2.png" data-lazy- height="555" src="data:image/svg xml,” width=”645″>

Mount a Btrfs Filesystem:

Create a directory /data with the following command:

<img alt="" data-lazy- data-lazy-src="https://kirelos.com/wp-content/uploads/2021/01/echo/sd3.png" data-lazy- height="134" src="data:image/svg xml,” width=”636″>

To mount the Btrfs filesystem created on the sdb1 partition on the /data directory, run the following command:

$ sudo mount /dev/sdb1 /data

<img alt="" data-lazy- data-lazy-src="https://kirelos.com/wp-content/uploads/2021/01/echo/sd4.png" data-lazy- height="79" src="data:image/svg xml,” width=”546″>

The Btrfs filesystem should be mounted, as you can see in the screenshot below.

<img alt="" data-lazy- data-lazy-src="https://kirelos.com/wp-content/uploads/2021/01/echo/sd5.png" data-lazy- height="154" src="data:image/svg xml,” width=”661″>

Installing Deduplication Tools on Ubuntu 20.04 LTS:

To deduplicate a Btrfs filesystem, you need to install the duperemove program on your computer.

If you’re using Ubuntu 20.04 LTS, then you can install duperemove from the official package repository of Ubuntu.

First, update the APT package repository cache with the following command:

<img alt="" data-lazy- data-lazy-src="https://kirelos.com/wp-content/uploads/2021/01/echo/sd6.png" data-lazy- height="283" src="data:image/svg xml,” width=”727″>

Install the duperemove package with the following command:

$ sudo apt install duperemove -y

<img alt="" data-lazy- data-lazy-src="https://kirelos.com/wp-content/uploads/2021/01/echo/sd7.png" data-lazy- height="86" src="data:image/svg xml,” width=”637″>

The duperemove package should be installed.

<img alt="" data-lazy- data-lazy-src="https://kirelos.com/wp-content/uploads/2021/01/echo/sd8.png" data-lazy- height="458" src="data:image/svg xml,” width=”1047″>

Installing Deduplication Tools on Fedora 33:

To deduplicate a Btrfs filesystem, you need to install the duperemove program on your computer.

If you’re using Fedora 33, then you can install duperemove from the official package repository of Fedora.

First, update the DNF package repository cache with the following command:

<img alt="" data-lazy- data-lazy-src="https://kirelos.com/wp-content/uploads/2021/01/echo/sd9.png" data-lazy- height="241" src="data:image/svg xml,” width=”910″>

Install the duperemove package with the following command:

$ sudo dnf install duperemove

<img alt="" data-lazy- data-lazy-src="https://kirelos.com/wp-content/uploads/2021/01/echo/sd10.png" data-lazy- height="93" src="data:image/svg xml,” width=”569″>

To confirm the installation, press Y and then press .

<img alt="" data-lazy- data-lazy-src="https://kirelos.com/wp-content/uploads/2021/01/echo/sd11.png" data-lazy- height="417" src="data:image/svg xml,” width=”917″>

The duperemove package should be installed.

<img alt="" data-lazy- data-lazy-src="https://kirelos.com/wp-content/uploads/2021/01/echo/sd12.png" data-lazy- height="557" src="data:image/svg xml,” width=”911″>

Testing Deduplication on a Btrfs Filesystem:

In this section, I am going to do a simple test to show you how the deduplication feature of the Btrfs filesystem removes redundant data from the filesystem and saves disk space.

As you can see,

  1. I have copied a file QGIS-OSGeo4W-3.14.0-1-Setup-x86_64.exe to the /data directory. The file is 407 MB in size.
  2. The file stored on the /data directory is 407 MB in size.
  3. Only the file consumed about 412 MB of disk space from the Btrfs filesystem mounted on the /data directory.

<img alt="" data-lazy- data-lazy-src="https://kirelos.com/wp-content/uploads/2021/01/echo/sd13.png" data-lazy- height="316" src="data:image/svg xml,” width=”1162″>

As you can see,

  1. I have copied the same file to the /data directory and renamed it to QGIS-OSGeo4W-3.14.0-1-Setup-x86_64.2.exe.
  2. The file stored on the /data directory is now 814 MB in size.
  3. The files consumed about 820 MB of disk space from the Btrfs filesystem mounted on the /data directory.

<img alt="" data-lazy- data-lazy-src="https://kirelos.com/wp-content/uploads/2021/01/echo/sd14.png" data-lazy- height="371" src="data:image/svg xml,” width=”1152″>

To perform the deduplication operation on the Btrfs filesystem mounted on the /data directory, run the following command:

$ sudo duperemove -dr /data

<img alt="" data-lazy- data-lazy-src="https://kirelos.com/wp-content/uploads/2021/01/echo/sd15.png" data-lazy- height="85" src="data:image/svg xml,” width=”539″>

The redundant data blocks from the Btrfs filesystem mounted on the /data directory should be removed.

<img alt="" data-lazy- data-lazy-src="https://kirelos.com/wp-content/uploads/2021/01/echo/sd16.png" data-lazy- height="642" src="data:image/svg xml,” width=”883″>

As you can see,

  1. I have the files QGIS-OSGeo4W-3.14.0-1-Setup-x86_64.exe and QGIS-OSGeo4W-3.14.0-1-Setup-x86_64.2.exe in /data directory.
  2. The file stored on the /data directory is now 814 MB in size.
  3. The files consumed about 412 MB of disk space from the Btrfs filesystem mounted on the /data directory.

The duperemove program removed redundant (duplicate) data blocks from the Btrfs filesystem mounted on the /data directory and saved a lot of disk spaces.

<img alt="" data-lazy- data-lazy-src="https://kirelos.com/wp-content/uploads/2021/01/echo/sd17.png" data-lazy- height="369" src="data:image/svg xml,” width=”1155″>

Automatically Mounting a Btrfs Filesystem on Boot:

To mount the Btrfs filesystem you have created, you need to know the UUID of the Btrfs filesystem.

You can find the UUID of the Btrfs filesystem mounted on the /data directory with the following command:

$ sudo btrfs filesystem show /data

<img alt="" data-lazy- data-lazy-src="https://kirelos.com/wp-content/uploads/2021/01/echo/sd18.png" data-lazy- height="69" src="data:image/svg xml,” width=”627″>

As you can see, the UUID of the Btrfs filesystem that I want to mount at boot time is e39ac376-90dd-4c39-84d2-e77abb5e3059. It will be different for you. So, make sure to replace it with yours from now on.

<img alt="" data-lazy- data-lazy-src="https://kirelos.com/wp-content/uploads/2021/01/echo/sd19.png" data-lazy- height="204" src="data:image/svg xml,” width=”891″>

Open the /etc/fstab file with the nano text editor as follows:

<img alt="" data-lazy- data-lazy-src="https://kirelos.com/wp-content/uploads/2021/01/echo/sd20.png" data-lazy- height="84" src="data:image/svg xml,” width=”555″>

Type in the following line at the end of the /etc/fstab file:

UUID=e39ac376-90dd-4c39-84d2-e77abb5e3059    /data    btrfs    defaults   0   0

NOTE: Replace the UUID of the Btrfs filesystem with yours. Also, change the mount option and compression algorithm as you like.

Once you’re done, press X followed by Y and to save the /etc/fstab file.

<img alt="" data-lazy- data-lazy-src="https://kirelos.com/wp-content/uploads/2021/01/echo/sd21.png" data-lazy- height="641" src="data:image/svg xml,” width=”971″>

For the changes to take effect, reboot your computer with the following command:

<img alt="" data-lazy- data-lazy-src="https://kirelos.com/wp-content/uploads/2021/01/echo/sd22.png" data-lazy- height="83" src="data:image/svg xml,” width=”403″>

Once your computer boots, the Btrfs filesystem should be mounted in the /data directory, as you can see in the screenshot below.

<img alt="" data-lazy- data-lazy-src="https://kirelos.com/wp-content/uploads/2021/01/echo/sd23.png" data-lazy- height="156" src="data:image/svg xml,” width=”525″>

Automatically Perform Deduplication using Cron Job:

To remove redundant data from the Btrfs filesystem, you have to run the duperemove command every once in a while.

You can automatically run the duperemove command hourly, daily, weekly, monthly, yearly, or at boot time using a cron job.

First, find the full path of the duperemove command with the following command:

<img alt="" data-lazy- data-lazy-src="https://kirelos.com/wp-content/uploads/2021/01/echo/sd24.png" data-lazy- height="82" src="data:image/svg xml,” width=”552″>

As you can see, the full path of the duperemove command is /usr/bin/duperemove. Remember the path as you will need it later.

<img alt="" data-lazy- data-lazy-src="https://kirelos.com/wp-content/uploads/2021/01/echo/sd25.png" data-lazy- height="219" src="data:image/svg xml,” width=”557″>

To edit the crontab file, run the following command:

<img alt="" data-lazy- data-lazy-src="https://kirelos.com/wp-content/uploads/2021/01/echo/sd26.png" data-lazy- height="93" src="data:image/svg xml,” width=”525″>

Select a text editor you like and press .

I will use the nano text editor. So, I will type in 1 and press .

<img alt="" data-lazy- data-lazy-src="https://kirelos.com/wp-content/uploads/2021/01/echo/sd27.png" data-lazy- height="297" src="data:image/svg xml,” width=”589″>

The crontab file should be opened.

<img alt="" data-lazy- data-lazy-src="https://kirelos.com/wp-content/uploads/2021/01/echo/sd28.png" data-lazy- height="603" src="data:image/svg xml,” width=”828″>

To run the duperemove command on the /data directory every hour, add the following line at the end of the crontab file.

@hourly /usr/bin/duperemove -dr /data >> /var/log/duperemove.log

<img alt="" data-lazy- data-lazy-src="https://kirelos.com/wp-content/uploads/2021/01/echo/sd29.png" data-lazy- height="598" src="data:image/svg xml,” width=”830″>

To run the duperemove command on the /data directory every day, add the following line at the end of the crontab file.

@daily /usr/bin/duperemove -dr /data >> /var/log/duperemove.log

<img alt="" data-lazy- data-lazy-src="https://kirelos.com/wp-content/uploads/2021/01/echo/sd30.png" data-lazy- height="601" src="data:image/svg xml,” width=”817″>

To run the duperemove command on the /data directory every week, add the following line at the end of the crontab file.

@weekly /usr/bin/duperemove -dr /data >> /var/log/duperemove.log

<img alt="" data-lazy- data-lazy-src="https://kirelos.com/wp-content/uploads/2021/01/echo/sd31.png" data-lazy- height="606" src="data:image/svg xml,” width=”832″>

To run the duperemove command on the /data directory every month, add the following line at the end of the crontab file.

@monthly /usr/bin/duperemove -dr /data >> /var/log/duperemove.log

<img alt="" data-lazy- data-lazy-src="https://kirelos.com/wp-content/uploads/2021/01/echo/sd32.png" data-lazy- height="605" src="data:image/svg xml,” width=”822″>

To run the duperemove command on the /data directory every year, add the following line at the end of the crontab file.

@yearly /usr/bin/duperemove -dr /data >> /var/log/duperemove.log

<img alt="" data-lazy- data-lazy-src="https://kirelos.com/wp-content/uploads/2021/01/echo/sd33.png" data-lazy- height="610" src="data:image/svg xml,” width=”839″>

To run the duperemove command on the /data directory at boot time, add the following line at the end of the crontab file.

@reboot /usr/bin/duperemove -dr /data >> /var/log/duperemove.log

<img alt="" data-lazy- data-lazy-src="https://kirelos.com/wp-content/uploads/2021/01/echo/sd34.png" data-lazy- height="598" src="data:image/svg xml,” width=”821″>

NOTE: I will run the duperemove command at boot time in this article.

Once you’re done, press X followed by Y and to save the crontab file.

<img alt="" data-lazy- data-lazy-src="https://kirelos.com/wp-content/uploads/2021/01/echo/sd35.png" data-lazy- height="598" src="data:image/svg xml,” width=”821″>

A new cron job should be installed.

<img alt="" data-lazy- data-lazy-src="https://kirelos.com/wp-content/uploads/2021/01/echo/sd36.png" data-lazy- height="338" src="data:image/svg xml,” width=”822″>

For the changes to take effect, reboot your computer with the following command:

<img alt="" data-lazy- data-lazy-src="https://kirelos.com/wp-content/uploads/2021/01/echo/sd37.png" data-lazy- height="87" src="data:image/svg xml,” width=”522″>

As the duperemove command runs in the background, the output of the command will be stored in the /var/log/duperemove.log file.

$ sudo ls -lh /var/log/duperemove*

<img alt="" data-lazy- data-lazy-src="https://kirelos.com/wp-content/uploads/2021/01/echo/sd38.png" data-lazy- height="132" src="data:image/svg xml,” width=”829″>

As you can see, the /var/log/duperemove.log file contains the duperemove log data. It means the cron job is working just fine.

<img alt="" data-lazy- data-lazy-src="https://kirelos.com/wp-content/uploads/2021/01/echo/sd39.png" data-lazy- height="529" src="data:image/svg xml,” width=”831″>

Conclusion:

In this article, I have shown you how to install the duperemove Brtfs deduplication tool on Ubuntu 20.04 LTS and Fedora 33. I have also shown you how to perform Btrfs deduplication using the duperemove tool and run the duperemove tool automatically using a cron job.

About the author

<img alt="Shahriar Shovon" data-lazy-src="https://kirelos.com/wp-content/uploads/2021/01/echo/photo2-150×150.png" height="112" src="data:image/svg xml,” width=”112″>

Shahriar Shovon

Freelancer & Linux System Administrator. Also loves Web API development with Node.js and JavaScript. I was born in Bangladesh. I am currently studying Electronics and Communication Engineering at Khulna University of Engineering & Technology (KUET), one of the demanding public engineering universities of Bangladesh.