Sunday, December 8, 2013

My Private Cloud

It is costly to rent systems from Azure/Amazon to do regular experiment at home for learning purposes when I need many machines. I have four very powerful machines lying around at home with a lot of RAM and 32 total processor cores. I bought them second hand from ebay for less than 500$.  These machines are not going to be economical for datacenters because of low processing power and RAM capability but for experimental purposes I can create hundreds of virtual machines easily (most of them wont be running most of the time).

Now installing operating system on each of the virtual machines is very time consuming task. So I decided to go with MAAS inside ubuntu server so it handles OS installation part. Also with juju I can deploy a lot of services (hadoop, mongodb, django and many many more)  with a single command.

I would not be exposing the services outside my home for other people. So I do not require high bandwidth. But I still want to see how it works from internet.

I split my home network in two subnetworks (192.168.0.0 and 192.168.1.0). One powers my home equipments and other is for my private cloud machines.

I have a router (192.168.0.1) that is connected to Internet and works as gateway to Internet.

I have another system (cloud gateway) running Linux with two network interfaces (192.168.0.100, 192.168.1.2) and configured as a router for my private cloud and works as gateway. It has forward packet rule enabled in iptables.

I have one virtual machine on the cloud gateway that I have configured as MAAS server.

Cloud Gateway


Dual interface 192.168.0.100, 192.168.1.2

192.168.0.100 is connected to Internet router by direct link [GW + DNS/DHCP: 192.168.0.1  ].
192.168.1.2 is connected to cloud machines switch (no DNS, DHCP - managed by MAAS server 192.168.1.3).

/etc/network/interfaces

#while br0 is using dhcp the IP is static and managed by router DHCP server
auto br0
iface br0 inet dhcp
        bridge_ports eth0
        bridge_stp off
        bridge_fd 0
        bridge_maxwait 0

auto br1
iface br1 inet static
        address 192.168.1.2
        netmask 255.255.255.0
        network 192.168.1.0
        broadcast 192.168.1.255
        bridge_ports eth1
        bridge_stp off
        bridge_fd 0
        bridge_maxwait 0


Enable forwarding:

sudo su
echo 1 > /proc/sys/net/ipv4/ip_forward

To make it run on boot

/etc/init.d/lekhonicloud
#!/bin/sh
# turn ip_forward on
echo 1 > /proc/sys/net/ipv4/ip_forward

sudo chmod +x /etc/init.d/lekhonicloud
sudo update-rc.d lekhonicloud defaults


iptables -A FORWARD -p tcp -m state -d 192.168.1.0/255.255.255.0 --state RELATED,ESTABLISHED -j ACCEPT

iptables -A FORWARD -p udp -m state -d 192.168.1.0/255.255.255.0 --state RELATED,ESTABLISHED -j ACCEPT

iptables -A FORWARD -p tcp -m state -d 192.168.0.0/255.255.255.0 --state RELATED,ESTABLISHED -j ACCEPT

iptables -A FORWARD -p udp -m state -d 192.168.1.0/255.255.255.0 --state RELATED,ESTABLISHED -j ACCEPT


Internet router


Dual Interface:
LAN 192.168.0.1
WAN [Static IP mapped to public domain name]

Destination     Gateway         Netmask         Flags Metric Ref    Use Iface
192.168.1.0     192.168.0.100   255.255.255.0   UG    1      0        0 LAN
192.168.0.0     *               255.255.255.0   U     0      0        0 LAN
default         [ISP Gtwy IP]   0.0.0.0         UG    0      0        0 WAN

 MAAS System Settings


From MAAS Network Config page
Interface eth0
Management: Manage DHCP and DNS
IP 192.168.1.3
Subnet mask 255.255.255.0
Router : 192.168.1.2
Broadcast: 192.168.1.0

IP Range: 192.168.1.110-192.168.1.149

/etc/network/interfaces

# The primary network interface
auto eth0
iface eth0 inet static
        address 192.168.1.3
        netmask 255.255.255.0
        gateway 192.168.1.2
        #dns-nameservers 192.168.1.3

auto eth1
iface eth1 inet dhcp
    #dns-nameservers 192.168.0.1

Install and configure MAAS

Installing MAAS is easy. Details are availabe at  http://maas.ubuntu.com/docs/install.html . I am putting here the basic steps. After you boot a system with Ubuntu server bootable disk from first screen select

Multiple server install with MAAS

Then follow the setup. When Install or enlist with Ubuntu MAAS Server dialog box select

Create a new MAAS on this server .

When you are asked for Ubuntu MAAS API address on the Configuring maas-cluster-controller provide the right IP. It should be automatically selected and shown. But if it is not correct change it:


http://<maas ip address>/MAAS/



If you want MAAS to manage DHCP and DNS install those components:

sudo apt-get install maas-dhcp maas-dns

It makes things much easier.

After thse you want to create an admin user and import boot images-

$ sudo maas createadmin --username=root --email=<adminemail>
$ maas-cli maas node-groups import-boot-images

You may now login to http://<maas ip address>/MAAS/ and start configuring.

Commission and start your servers


When you first start a server that is not yet added to MAAS service and it has PXE boot enabled it will boot using an image that it receives from MAAS maintained DHCP/TFTP server and collect some data and tell MAAS about its various hardware configuration and shutdown itself. The node will be shown in MAAS node list. You can changes some of the parameters like host name, power options and press commission button to start commission. If the system is not powered up for some reason you may manually power up. The system will again boot itself using boot image from MAAS and poweroff itself. The system will now be shown as Ready in MASS console. You may start the node now. On boot it will install Ubuntu (you can set which version from MAAS web console).

You may keep any node in ready state so that juju can pick up for deploying any service on it. We will configure juju shortly. Since I have only 4 physical servers I'll be using virtual machines to create juju node pool. MAAS treats both type of machines in the same way.

MAAS inserts the SSH keys, that we set in the MAAS Preferences page, for the default user "ubuntu". But in case the network interface is down it is hard to get the machine back. While it may be a security bad practice, I create a user  on the physical server so that I can login through console.

Install KVM on each Server

First install the required components and setup user:

sudo apt-get install qemu-kvm libvirt-bin ubuntu-vm-builder bridge-utils
 
sudo adduser `id -un` kvm
sudo adduser `id -un` libvirtd
 
Now logout and login back and verify installation using:
 
ubuntu@Server06:~$ virsh -c qemu:///system list






 Id    Name                           State
----------------------------------------------------

 

Setting up virsh on MAAS system


From MAAS system the query 
 
$ virsh -c qemu+ssh://<username>@<serverip>/system list --all 
 
should show all virtual machines on the server when you are logged in as maas 
user (sudo su - maas). If it doers not show up we must do the following steps.

First make sure the user maas has its home directory (/home/maas). 
If not create them and assign /bin/bash as default shell:
$ mkdir /home/maas
$ chown maas:maas /home/maas 
$ sudo chsh maas /bin/bash 

Now sudo as mass

$ sudo su - maas
 
Now generate the key and assign copy it to the server: 
 
$ ssh-keygen
$ ssh-copy-id -i ~/.ssh/id_rsa <username>@<server_ip>
 
The username here can be any user name. If that user is not already setup with ssh keys 
you'll be prompted for password.
 
Now test again if the command works:
 
$ virsh -c qemu+ssh://<username>@<serverip>/system list --all 
 
If it still fails look at http://wiki.libvirt.org/page/Failed_to_connect_to_the_hypervisor for  more troubleshooting information.

Now you may use qemu+ssh://<username>@<serverip>/system in the node configuration page on MAAS portal.

Power Type: virsh (virtual systems)
Address: qemu+ssh://<username>@<serverip>/system
Power Id: <Virtual machine name>

The <virtual machine name> should be same as it is shown for the command

virsh -c qemu+ssh://<username>@<serverip>/system list --all

With this setup MAAS should automatically start and stop the system. When it says "This node is now allocated to you. It has been asked to start up." make sure the system is actually has been powered up.


Install and Configure Juju

First install the components:


sudo apt-get install juju-core juju

Then create a configuration file ~/.juju/environments.yaml

default: maas
envirnments:
 maas:
    type: maas
    # Change this to where your MAAS server lives.  It must specify the base path.
    maas-server: 'http://<MAAS Server IP>/MAAS/'
    maas-oauth: '<MAAS API key>'
    admin-secret: '<admin secret>
    default-series: precise
    authorized-keys-path: ~/.ssh/id_rsa.pub #authorized_keys # or any file you want.
    # Or:
    # authorized-keys: ssh-rsa keymaterialhere

You can get MAAS API Key from MAAS preferences page.

You can generate this using juju generate-config command and then edit the file to keep only configuration for MAAS and putting in the keys.

Now do the bootstrap using

$ juju bootstrap --upload-tools

And see the status using

$ juju status 

You may now deploy hadoop

$ juju deploy hadoop

This will pick a machine from MAAS machine pool  and install linux na dthen hadoop in it.

 

References 

http://askubuntu.com/questions/95354/how-to-enable-forwarding-of-data-between-two-local-interfaces

http://maas.ubuntu.com/docs/install.html

https://help.ubuntu.com/community/KVM/Installation

http://askubuntu.com/questions/292061/how-to-configure-maas-to-be-able-to-boot-virtual-machines


http://wiki.libvirt.org/page/Failed_to_connect_to_the_hypervisor

http://maas.ubuntu.com/docs/juju-quick-start.html

Sunday, October 20, 2013

Hadoop quickstart: Run your first MapReduce job in 10 minutes

We setup a single machine cluster using a few simple steps. This is an experimental system to quickly get started with hadoop. Once we get familiar with the core components we can easily add more nodes to the system. There are a few differences from the official quickstart guide.

Assuming that you have downloaded the latest hadoop binary from apache hadoop site lets start configuring it.

Lets assume you have extracted the contents at

/home/hduser/hadoop

Now let us generate ssh key so that we can connect the machine using ssh without typing a password.

$ ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

This should do the trick. Make sure you can connect to localhost without typing in password

$ ssh localhost


You should be connected to local machine. Exit the session for next steps.

Install JDK 7  on your machine. I am assuming jdk is instaed at /usr/lib/jvm/java-7-openjdk-amd64.

Lets edit .bashrc (or your shells init script if you use something else than bash) and add following lines at the end.

export HADOOP_HOME=/home/hduser/hadoop
export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64
export PATH=$PATH:$HADOOP_HOME/bin
How edit the file $HADOOP_HOME/etc/hadoop/hadoop-env.sh and add the following line after export JSVC_HOME=${JSVC_HOME} line.
export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64

Now we need to edit 3 configuration files in hadoop. In each files we'll ne adding few property blocks inside the <configuration> ... </<configuration> block.

$HADOOP_HOME/etc/hadoop/core-site.xml

<property>
  <name>hadoop.tmp.dir</name>
  <value>/home/hduser/tmp</value>
  <description>TMP Directory</description>
</property>
<property>
  <name>fs.default.name</name>
  <value>hdfs://localhost:9000</value>
  <description></description>
</property>

$HADOOP_HOME/etc/hadoop/mapred-site.xml

<property>
  <name>mapred.job.tracker</name>
  <value>localhost:9001</value>
  <description>MapReduce job tracker</description>
</property>

$HADOOP_HOME/etc/hadoop/hdfs-site.xml

<property>
  <name>dfs.replication</name>
  <value>1</value>
  <description></description>
</property>

After editing the files we are now ready to format the filesystem-

$hdfs namenode -format

It should show some information and along with those you should see one line like:

13/10/20 11:02:21 INFO common.Storage: Storage directory /home/hduser/tmp/dfs/name has been successfully formatted.

Now the HDFS filesystem is ready. Lets create a directory to store our data file.

hadoop fs -mkdir /user
hadoop fs -mkdir /user/hduser
hadoop fs -mkdir /user/hduser/data

Now we need sample data files those contain a lot of words in each file. You may use a free book that is in text format. I have created three sample text files for this and put those in /home/hduser/data/ directory on my machine.

We can copy these files to HDFS using following command:

#Usage: hadoop fs -copyFromLocal <localsrc> URI
bin/hdfs dfs -copyFromLocal  /home/hduser/data/* /user/hduser/data

There are two paths required for copyFromLocal  operation- first one is the local files path and second one is HDFS uri.  We can see the files using -ls command-

hduser@hadoop:~/hadoop$ hdfs dfs -ls /user/hduser/data


-rw-r--r--   1 hduser supergroup     538900 2013-10-20 11:08 /user/hduser/data/file1.txt
-rw-r--r--   1 hduser supergroup     856710 2013-10-20 11:08 /user/hduser/data/file2.txt
-rw-r--r--   1 hduser supergroup     357800 2013-10-20 11:08 /user/hduser/data/file3.txt

Now lets start the hadoop cluster:

hduser@hadoop:~/hadoop$ sbin/start-all.sh
This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
localhost:  09:36:31 up 14:05,  2 users,  load average: 0.02, 0.04, 0.08
[output removed]

We are now ready to run the mapreduce sample available with hadoop. Lets copt the jar file to our current directory.

cp /home/hduser/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar ./

And run it by providing the mapreduce class name, data location and output location:

bin/hadoop jar hadoop-mapreduce-examples-2.2.0.jar wordcount /user/hduser/data /user/hduser/wcoutput

13/10/20 11:25:36 INFO input.FileInputFormat: Total input paths to process : 3
13/10/20 11:25:36 INFO mapreduce.JobSubmitter: number of splits:3
13/10/20 11:25:37 INFO mapreduce.Job: Running job: job_local3325941_0001
[output removed]
13/10/20 11:25:43 INFO mapreduce.Job: Job job_local3325941_0001 completed successfully

OK. The job has been completed successfully. We should be able to see the output files in the output directory.

hduser@hadoop:~/hadoop$hdfs dfs -ls /user/hduser/wcoutput/

-rw-r--r--   1 hduser supergroup          0 2013-10-20 11:25 /user/hduser/wcoutput/_SUCCESS
-rw-r--r--   1 hduser supergroup     880838 2013-10-20 11:25 /user/hduser/wcoutput/part-r-00000

Now we can download the output file to our local disk-

#Usage: hadoop fs -copyToLocal [-ignorecrc] [-crc] URI <localdst>
$bin/hdfs dfs -copyToLocal  /user/hduser/wcoutput/part-r-00000 ./wc_result.txt

And view that if you want.

$vim wc_result.txt

Finally we stop the cluter.

$sbin/stop-all.sh