hadoop on docker as a single cluster



Start your instance by running the following command :-


docker run -it sequenceiq/hadoop-docker:2.7.0 /etc/bootstrap.sh -bash

or

docker run -it -p 50070:50070 sequenceiq/hadoop-docker:2.7.0 /etc/bootstrap.sh -bash

You can browse using the following url

http://192.168.99.100:50070/explorer.html#/





Next, run the following commands :-


>docker run -it -p 50070:50070 -v c:/tmp/hive:/hive sequenceiq/hadoop-docker:2.7.0 /etc/bootstrap.sh -bash

For docker toolbox you might need to run the following command :- 

docker run -it -p 50070:50070 -v /hive:/hive sequenceiq/hadoop-docker:2.7.0 /etc/bootstrap.sh -bash


To submit jobs

$HADOOP_HOME/bin/hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.0.jar grep input
 output 'dfs[a-z.]+'

$HADOOP_HOME/bin/hdfs dfs -cat output/*




Turning on Hive (this step is optional)


To enable Hive, 

Please download and install Hive (basically extract it - :) to c:/tmp/hive (the command above already mounts it to a hive directory in your docker vm)



Exporting task 

export HADOOP_HOME=/usr/local/hadoop-2.7.0
export HIVE_HOME=/hive

$HADOOP_HOME/bin/hadoop fs -mkdir       /tmp
$HADOOP_HOME/bin/hadoop fs -mkdir       /user
$HADOOP_HOME/bin/hadoop fs -mkdir       /user/hive
$HADOOP_HOME/bin/hadoop fs -mkdir       /user/hive/warehouse
$HADOOP_HOME/bin/hadoop fs -chmod g+w   /tmp
$HADOOP_HOME/bin/hadoop fs -chmod g+w   /user/hive/warehouse


Initialize your metastore

$HIVE_HOME/bin/schematool -initSchema -dbType derby

(this works. For some reason if you run it slightly differently, like so, it is not happy

$HIVE_HOME/bin/schematool -dbType  -initSchema



Starting Hive

Start Hive using the following command 

$HIVE_HOME/bin/hive







Starting Hive server



$HIVE_HOME/bin/hiveserver2


Starting Beeline

$HIVE_HOME/bin/beeline -u jdbc:hive2://localhost:10000 -n maria_dev





./$HADOOP_HOME/bin/hdfs dfs -mkdir /app
./$HADOOP_HOME/bin/hdfs dfs -put /hive/sample.csv /app



CREATE SCHEMA IF NOT EXISTS bdp;
CREATE EXTERNAL TABLE IF NOT EXISTS bdp.hv_csv_table
(id STRING,Code STRING)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS TEXTFILE

LOCATION 'hdfs://localhost:8020/app/sample.csv';





Comments

Popular posts from this blog

The specified initialization vector (IV) does not match the block size for this algorithm