Apache Hadoop is a Java-based, open-source, freely available software platform for storing and analyzing big datasets on your system clusters. It keeps its data in the Hadoop Distributed File system (HDFS) and processes it utilizing MapReduce. Hadoop has been used in machine learning and data mining techniques. It is also used for managing multiple dedicated servers.

The primary components of Apache Hadoop are:

  • HDFS: In Apache Hadoop, HDFS is a file system that is distributed over numerous nodes.
  • MapReduce: It is a framework for developing applications that handle a massive amount of data.
  • Hadoop Common: It is a set of libraries and utilities that are needed by Hadoop modules.
  • Hadoop YARN: In Hadoop, Hadoop Yarn manages the layers of resources.

Now, check out the below-given methods for installing and configuring Apache Hadoop on your Ubuntu system. So let’s start!

How to install Apache Hadoop on Ubuntu

First of all, we will open up our Ubuntu terminal by pressing “CTRL ALT T”, you can also type “terminal” in the application’s search bar as follows:

<img alt="" data-lazy- data-lazy-src="https://kirelos.com/wp-content/uploads/2021/09/echo/How-to-Install-and-Configure-Apache-Hadoop-on-Ubuntu-1.png" data-lazy- height="373" src="data:image/svg xml,” width=”792″>

The next step is to update the system repositories:

<img alt="" data-lazy- data-lazy-src="https://kirelos.com/wp-content/uploads/2021/09/echo/How-to-Install-and-Configure-Apache-Hadoop-on-Ubuntu-2.png" data-lazy- height="350" src="data:image/svg xml,” width=”738″>

Now we will install Java on our Ubuntu system by writing out the following command in the terminal:

$ sudo apt install openjdk-11-jdk

<img alt="" data-lazy- data-lazy-src="https://kirelos.com/wp-content/uploads/2021/09/echo/How-to-Install-and-Configure-Apache-Hadoop-on-Ubuntu-3.png" data-lazy- height="620" src="data:image/svg xml,” width=”740″>

Enter “y/Y” to permit the installation process to continue:

<img alt="" data-lazy- data-lazy-src="https://kirelos.com/wp-content/uploads/2021/09/echo/How-to-Install-and-Configure-Apache-Hadoop-on-Ubuntu-4.png" data-lazy- height="484" src="data:image/svg xml,” width=”742″>

<img alt="" data-lazy- data-lazy-src="https://kirelos.com/wp-content/uploads/2021/09/echo/How-to-Install-and-Configure-Apache-Hadoop-on-Ubuntu-5.png" data-lazy- height="472" src="data:image/svg xml,” width=”741″>

Now, verify the existence of the installed Java by checking its version:

<img alt="" data-lazy- data-lazy-src="https://kirelos.com/wp-content/uploads/2021/09/echo/How-to-Install-and-Configure-Apache-Hadoop-on-Ubuntu-6.png" data-lazy- height="305" src="data:image/svg xml,” width=”735″>

We will create a separate user for running Apache Hadoop on our system by utilizing the “adduser” command:

$ sudo adduser hadoopuser

Enter the new user’s password, its full name, and other information. Type “y/Y” to confirm that the provided information is correct:

<img alt="" data-lazy- data-lazy-src="https://kirelos.com/wp-content/uploads/2021/09/echo/How-to-Install-and-Configure-Apache-Hadoop-on-Ubuntu-7.png" data-lazy- height="484" src="data:image/svg xml,” width=”738″>

It’s time to switch the current user with the created Hadoop user, which is “hadoopuser” in our case:

<img alt="" data-lazy- data-lazy-src="https://kirelos.com/wp-content/uploads/2021/09/echo/How-to-Install-and-Configure-Apache-Hadoop-on-Ubuntu-8.png" data-lazy- height="278" src="data:image/svg xml,” width=”740″>

Now, utilize the below-given command for generating private and public key pairs:

Enter the file address where you want to save the key pair. After this, add a passphrase that you are going to be used in the whole setup of the Hadoop user:

<img alt="" data-lazy- data-lazy-src="https://kirelos.com/wp-content/uploads/2021/09/echo/How-to-Install-and-Configure-Apache-Hadoop-on-Ubuntu-9.png" data-lazy- height="479" src="data:image/svg xml,” width=”739″>

Next, add these key pairs to the ssh authorized_keys:

at ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

<img alt="" data-lazy- data-lazy-src="https://kirelos.com/wp-content/uploads/2021/09/echo/How-to-Install-and-Configure-Apache-Hadoop-on-Ubuntu-10.png" data-lazy- height="284" src="data:image/svg xml,” width=”740″>

As we have stored the generated key pair in the ssh authorized key, now we will change the file permissions to “640” which means that only we as the “owner” of the file will have the read and write permissions, “groups” will only have the read permission. No permission will be granted to “other users”:

$ chmod 640 ~/.ssh/authorized_keys

<img alt="" data-lazy- data-lazy-src="https://kirelos.com/wp-content/uploads/2021/09/echo/How-to-Install-and-Configure-Apache-Hadoop-on-Ubuntu-11.png" data-lazy- height="263" src="data:image/svg xml,” width=”738″>

Now authenticate the localhost by writing out the following command:

<img alt="" data-lazy- data-lazy-src="https://kirelos.com/wp-content/uploads/2021/09/echo/How-to-Install-and-Configure-Apache-Hadoop-on-Ubuntu-12.png" data-lazy- height="552" src="data:image/svg xml,” width=”739″>

Utilize the below-given wget command for installing the Hadoop framework for your system:

$ wget https://downloads.apache.org/hadoop/common/hadoop-3.3.0/hadoop-3.3.0.tar.gz

<img alt="" data-lazy- data-lazy-src="https://kirelos.com/wp-content/uploads/2021/09/echo/How-to-Install-and-Configure-Apache-Hadoop-on-Ubuntu-13.png" data-lazy- height="476" src="data:image/svg xml,” width=”736″>

Extract the downloaded “hadoop-3.3.0.tar.gz” file with the tar command:

$ tar -xvzf hadoop-3.3.0.tar.gz

<img alt="" data-lazy- data-lazy-src="https://kirelos.com/wp-content/uploads/2021/09/echo/How-to-Install-and-Configure-Apache-Hadoop-on-Ubuntu-14.png" data-lazy- height="481" src="data:image/svg xml,” width=”740″>

You can also rename the extracted directory as we will do by executing the below-given command:

<img alt="" data-lazy- data-lazy-src="https://kirelos.com/wp-content/uploads/2021/09/echo/How-to-Install-and-Configure-Apache-Hadoop-on-Ubuntu-15.png" data-lazy- height="281" src="data:image/svg xml,” width=”736″>

Now, configure Java environment variables for setting up Hadoop. For this, we will check out the location of our “JAVA_HOME” variable:

$ dirname $(dirname $(readlink -f $(which java)))

<img alt="" data-lazy- data-lazy-src="https://kirelos.com/wp-content/uploads/2021/09/echo/How-to-Install-and-Configure-Apache-Hadoop-on-Ubuntu-16.png" data-lazy- height="258" src="data:image/svg xml,” width=”737″>

Open the “~/.bashrc” file in your “nano” text editor:

<img alt="" data-lazy- data-lazy-src="https://kirelos.com/wp-content/uploads/2021/09/echo/How-to-Install-and-Configure-Apache-Hadoop-on-Ubuntu-17.png" data-lazy- height="267" src="data:image/svg xml,” width=”738″>

Add the following paths in the opened “~/.bashrc” file:

export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64

export HADOOP_HOME=/home/hadoopuser/hadoop

export HADOOP_INSTALL=$HADOOP_HOME

export HADOOP_MAPRED_HOME=$HADOOP_HOME

export HADOOP_COMMON_HOME=$HADOOP_HOME

export HADOOP_HDFS_HOME=$HADOOP_HOME

export HADOOP_YARN_HOME=$HADOOP_HOME

export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native

export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin

export HADOOP_OPTS=“-Djava.library.path=$HADOOP_HOME/lib/native”

After that, press “CTRL O” to save the changes we made in the file:

<img alt="" data-lazy- data-lazy-src="https://kirelos.com/wp-content/uploads/2021/09/echo/How-to-Install-and-Configure-Apache-Hadoop-on-Ubuntu-18.png" data-lazy- height="476" src="data:image/svg xml,” width=”741″>

Now, write out the below-given command to activate the “JAVA_HOME” environment variable:

<img alt="" data-lazy- data-lazy-src="https://kirelos.com/wp-content/uploads/2021/09/echo/How-to-Install-and-Configure-Apache-Hadoop-on-Ubuntu-19.png" data-lazy- height="261" src="data:image/svg xml,” width=”737″>

The next thing we have to do is to open up the environment variable file of Hadoop:

$ nano $HADOOP_HOME/etc/hadoop/hadoop-env.sh

<img alt="" data-lazy- data-lazy-src="https://kirelos.com/wp-content/uploads/2021/09/echo/How-to-Install-and-Configure-Apache-Hadoop-on-Ubuntu-20.png" data-lazy- height="260" src="data:image/svg xml,” width=”737″>

We have to set our “JAVA_HOME” variable in the Hadoop environment:

export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64

<img alt="" data-lazy- data-lazy-src="https://kirelos.com/wp-content/uploads/2021/09/echo/How-to-Install-and-Configure-Apache-Hadoop-on-Ubuntu-21.png" data-lazy- height="480" src="data:image/svg xml,” width=”737″>

Again, press “CTRL O” to save the file content:

<img alt="" data-lazy- data-lazy-src="https://kirelos.com/wp-content/uploads/2021/09/echo/How-to-Install-and-Configure-Apache-Hadoop-on-Ubuntu-22.png" data-lazy- height="478" src="data:image/svg xml,” width=”737″>

How to configure Apache Hadoop on Ubuntu

Till this point, we have successfully installed JAVA and Hadoop, created Hadoop users, configured SSH key-based authentication. Now, we will move forward to show you how to configure Apache Hadoop on the Ubuntu system. For this, the step is to create two directories: datanode and namenode, inside the home directory of Hadoop:

$ mkdir -p ~/hadoopdata/hdfs/namenode

<img alt="" data-lazy- data-lazy-src="https://kirelos.com/wp-content/uploads/2021/09/echo/How-to-Install-and-Configure-Apache-Hadoop-on-Ubuntu-23.png" data-lazy- height="272" src="data:image/svg xml,” width=”736″>

$ mkdir -p ~/hadoopdata/hdfs/datanode

<img alt="" data-lazy- data-lazy-src="https://kirelos.com/wp-content/uploads/2021/09/echo/How-to-Install-and-Configure-Apache-Hadoop-on-Ubuntu-24.png" data-lazy- height="263" src="data:image/svg xml,” width=”735″>

We will update the Hadoop “core-site.xml” file by adding our hostname, so firstly, confirm your system hostname by executing this command:

<img alt="" data-lazy- data-lazy-src="https://kirelos.com/wp-content/uploads/2021/09/echo/How-to-Install-and-Configure-Apache-Hadoop-on-Ubuntu-25.png" data-lazy- height="262" src="data:image/svg xml,” width=”735″>

Now, open up the “core-site.xml” file in your “nano” editor:

$ nano $HADOOP_HOME/etc/hadoop/core-site.xml

<img alt="" data-lazy- data-lazy-src="https://kirelos.com/wp-content/uploads/2021/09/echo/How-to-Install-and-Configure-Apache-Hadoop-on-Ubuntu-26.png" data-lazy- height="290" src="data:image/svg xml,” width=”739″>

Our system hostname in “linuxhint-VBox”, you can add the following lines with system’s host name in the opened “core-site.xml” Hadoop file:

<configuration>

<property>


                <name>fs.defaultFS</name>


                <value>hdfs://hadoop.linuxhint-VBox.com:9000</value>


        </property>

</configuration>

<img alt="" data-lazy- data-lazy-src="https://kirelos.com/wp-content/uploads/2021/09/echo/How-to-Install-and-Configure-Apache-Hadoop-on-Ubuntu-27.png" data-lazy- height="477" src="data:image/svg xml,” width=”737″>

Press “CTRL O” and save the file:

<img alt="" data-lazy- data-lazy-src="https://kirelos.com/wp-content/uploads/2021/09/echo/How-to-Install-and-Configure-Apache-Hadoop-on-Ubuntu-28.png" data-lazy- height="480" src="data:image/svg xml,” width=”741″>

In the “hdfs-site.xml” file, we will change the directory path of “datanode” and “namenode”:

$ nano $HADOOP_HOME/etc/hadoop/hdfs-site.xml

<img alt="" data-lazy- data-lazy-src="https://kirelos.com/wp-content/uploads/2021/09/echo/How-to-Install-and-Configure-Apache-Hadoop-on-Ubuntu-29.png" data-lazy- height="351" src="data:image/svg xml,” width=”735″>

<configuration>


 


        <property>


                <name>dfs.replication</name>


                <value>1</value>


        </property>


 


        <property>


                <name>dfs.name.dir</name>


                <value>file:///home/hadoopuser/hadoopdata/hdfs/namenode</value>


        </property>


 


        <property>


                <name>dfs.data.dir</name>


                <value>file:///home/hadoopuser/hadoopdata/hdfs/datanode</value>


        </property>

</configuration>

<img alt="" data-lazy- data-lazy-src="https://kirelos.com/wp-content/uploads/2021/09/echo/How-to-Install-and-Configure-Apache-Hadoop-on-Ubuntu-30.png" data-lazy- height="477" src="data:image/svg xml,” width=”741″>

Again, to write out the added code in the file, press “CRTL O”:

<img alt="" data-lazy- data-lazy-src="https://kirelos.com/wp-content/uploads/2021/09/echo/How-to-Install-and-Configure-Apache-Hadoop-on-Ubuntu-31.png" data-lazy- height="478" src="data:image/svg xml,” width=”741″>

Next, open up the “mapred-site.xml” file and add the below-given code in it:

$ nano $HADOOP_HOME/etc/hadoop/mapred-site.xml

<img alt="" data-lazy- data-lazy-src="https://kirelos.com/wp-content/uploads/2021/09/echo/How-to-Install-and-Configure-Apache-Hadoop-on-Ubuntu-32.png" data-lazy- height="280" src="data:image/svg xml,” width=”736″>

<configuration>


        <property>


                <name>mapreduce.framework.name</name>


                <value>yarn</value>


        </property>

</configuration>

<img alt="" data-lazy- data-lazy-src="https://kirelos.com/wp-content/uploads/2021/09/echo/How-to-Install-and-Configure-Apache-Hadoop-on-Ubuntu-33.png" data-lazy- height="480" src="data:image/svg xml,” width=”739″>

Press “CTRL O” to save the changes you made into the file:

<img alt="" data-lazy- data-lazy-src="https://kirelos.com/wp-content/uploads/2021/09/echo/How-to-Install-and-Configure-Apache-Hadoop-on-Ubuntu-34.png" data-lazy- height="479" src="data:image/svg xml,” width=”740″>

The last file that needs to be updated is the “yarn-site.xml”. Open this Hadoop file in the “nano” editor:

$ nano $HADOOP_HOME/etc/hadoop/yarn-site.xml

<img alt="" data-lazy- data-lazy-src="https://kirelos.com/wp-content/uploads/2021/09/echo/How-to-Install-and-Configure-Apache-Hadoop-on-Ubuntu-35.png" data-lazy- height="264" src="data:image/svg xml,” width=”737″>

Write out below-given lines in “yarn-site.xml” file:

<configuration>


        <property>


                <name>yarn.nodemanager.aux-services</name>


                <value>mapreduce_shuffle</value>


        </property>

</configuration>

<img alt="" data-lazy- data-lazy-src="https://kirelos.com/wp-content/uploads/2021/09/echo/How-to-Install-and-Configure-Apache-Hadoop-on-Ubuntu-36.png" data-lazy- height="479" src="data:image/svg xml,” width=”741″>

<img alt="" data-lazy- data-lazy-src="https://kirelos.com/wp-content/uploads/2021/09/echo/How-to-Install-and-Configure-Apache-Hadoop-on-Ubuntu-37.png" data-lazy- height="475" src="data:image/svg xml,” width=”741″>

We have to start the Hadoop cluster to operate Hadoop. For this, we will format our “namenode” first:

<img alt="" data-lazy- data-lazy-src="https://kirelos.com/wp-content/uploads/2021/09/echo/How-to-Install-and-Configure-Apache-Hadoop-on-Ubuntu-38.png" data-lazy- height="484" src="data:image/svg xml,” width=”740″>

<img alt="" data-lazy- data-lazy-src="https://kirelos.com/wp-content/uploads/2021/09/echo/How-to-Install-and-Configure-Apache-Hadoop-on-Ubuntu-39.png" data-lazy- height="479" src="data:image/svg xml,” width=”738″>

Now start the Hadoop cluster by writing out the below-given command in your terminal:

<img alt="" data-lazy- data-lazy-src="https://kirelos.com/wp-content/uploads/2021/09/echo/How-to-Install-and-Configure-Apache-Hadoop-on-Ubuntu-40.png" data-lazy- height="189" src="data:image/svg xml,” width=”740″>

In the process of starting the Hadoop cluster, if you get the “Could resolve hostname error”, then you have to specify the hostname in the “/etc/host” file:

<img alt="" data-lazy- data-lazy-src="https://kirelos.com/wp-content/uploads/2021/09/echo/How-to-Install-and-Configure-Apache-Hadoop-on-Ubuntu-41.png" data-lazy- height="438" src="data:image/svg xml,” width=”747″>

Save the “/etc/host” file, and now you are all ready to start the Hadoop cluster:

<img alt="" data-lazy- data-lazy-src="https://kirelos.com/wp-content/uploads/2021/09/echo/How-to-Install-and-Configure-Apache-Hadoop-on-Ubuntu-42.png" data-lazy- height="275" src="data:image/svg xml,” width=”742″>

In the next step, we will start the “yarn” service of the Hadoop:

The execution of the above-given command will show you the following output:

<img alt="" data-lazy- data-lazy-src="https://kirelos.com/wp-content/uploads/2021/09/echo/How-to-Install-and-Configure-Apache-Hadoop-on-Ubuntu-43.png" data-lazy- height="256" src="data:image/svg xml,” width=”739″>

To check the status of all services of Hadoop, execute the “jps” command in your terminal:

The output shows that all services are running successfully:

<img alt="" data-lazy- data-lazy-src="https://kirelos.com/wp-content/uploads/2021/09/echo/How-to-Install-and-Configure-Apache-Hadoop-on-Ubuntu-44.png" data-lazy- height="299" src="data:image/svg xml,” width=”746″>

Hadoop listens at the port 8088 and 9870, so you are required to permit these ports through the firewall:

$ firewall-cmd –permanent –add-port=9870/tcp

<img alt="" data-lazy- data-lazy-src="https://kirelos.com/wp-content/uploads/2021/09/echo/How-to-Install-and-Configure-Apache-Hadoop-on-Ubuntu-45.png" data-lazy- height="235" src="data:image/svg xml,” width=”738″>

$ firewall-cmd –permanent –add-port=8088/tcp

<img alt="" data-lazy- data-lazy-src="https://kirelos.com/wp-content/uploads/2021/09/echo/How-to-Install-and-Configure-Apache-Hadoop-on-Ubuntu-46.png" data-lazy- height="268" src="data:image/svg xml,” width=”738″>

Now, reload the firewall settings:

<img alt="" data-lazy- data-lazy-src="https://kirelos.com/wp-content/uploads/2021/09/echo/How-to-Install-and-Configure-Apache-Hadoop-on-Ubuntu-47.png" data-lazy- height="236" src="data:image/svg xml,” width=”735″>

Now, open up your browser, and access your Hadoop “namenode” by entering your IP address with the port 9870:

<img alt="" data-lazy- data-lazy-src="https://kirelos.com/wp-content/uploads/2021/09/echo/How-to-Install-and-Configure-Apache-Hadoop-on-Ubuntu-48.png" data-lazy- height="660" src="data:image/svg xml,” width=”1291″>

Utilize the port “8080” with your IP address to access the Hadoop resource manager:

<img alt="" data-lazy- data-lazy-src="https://kirelos.com/wp-content/uploads/2021/09/echo/How-to-Install-and-Configure-Apache-Hadoop-on-Ubuntu-49.png" data-lazy- height="612" src="data:image/svg xml,” width=”1293″>

On the Hadoop web interface, you can look for the “Browse Directory” by scroll down the opened web page as follows:

<img alt="" data-lazy- data-lazy-src="https://kirelos.com/wp-content/uploads/2021/09/echo/How-to-Install-and-Configure-Apache-Hadoop-on-Ubuntu-50.png" data-lazy- height="635" src="data:image/svg xml,” width=”1297″>

That was all about installing and configuring Apache Hadoop on the Ubuntu system. For stopping the Hadoop cluster, you have to stop the services of “yarn” and “namenode”:

<img alt="" data-lazy- data-lazy-src="https://kirelos.com/wp-content/uploads/2021/09/echo/How-to-Install-and-Configure-Apache-Hadoop-on-Ubuntu-51.png" data-lazy- height="237" src="data:image/svg xml,” width=”746″>

<img alt="" data-lazy- data-lazy-src="https://kirelos.com/wp-content/uploads/2021/09/echo/How-to-Install-and-Configure-Apache-Hadoop-on-Ubuntu-52.png" data-lazy- height="240" src="data:image/svg xml,” width=”745″>

Conclusion

For different big data applications, Apache Hadoop is a freely available platform for managing, storing, and processing data that operates on clustered servers. It is a fault-tolerant distributed file system that allows parallel processing. In Hadoop, the MapReduce model is utilized for storing and extracting data from its nodes. In this article, we have shown you the method for installing and configuring Apache Hadoop on your Ubuntu system.