The Metastore stores all the information about the tables, their partitions, the schemas, the columns and their types, the table locations etc. This information can be queried or modified using a thrift interface and as a result it can be called from clients in different programming languages. As this information needs to be served fast to the compiler, they have chosen to store this information on a traditional RDBMS.
By default, hive use an embedded Derby database to store metadata information. The metastore is the "glue" between Hive and HDFS. It tells Hive where your data files live in HDFS, what type of data they contain, what tables they belong to, etc.
By default, hive use an embedded Derby database to store metadata information. The metastore is the "glue" between Hive and HDFS. It tells Hive where your data files live in HDFS, what type of data they contain, what tables they belong to, etc.
The Metastore is an application that runs on an RDBMS and uses an open source ORM layer called DataNucleus, to convert object representations into a relational schema and vice versa. They chose this approach as opposed to storing this information in hdfs as they need the Metastore to be very low latency. The DataNucleus layer allows them to plugin many different RDBMS technologies.
Facebook uses MySQL to store metadata information.
Store Hive metadata into MySQL:
We need to perform following steps to setup a metastore in a MySQL server.
1. Install mysql-server.
sudo apt-get install mysql-server
2. Use su command. you will see password promt. Type your root
userpassword.
3. If your password correct, you'll see like this; root@yourdesktopname#.
4. Now, Start mysql server.
/etc/init.d/mysql start.
5. When mysql server started, type mysql -u root mysql.
6. Next, I'm going to create a new MySQL user for hadoop/hive.
mysql> CREATE USER 'hadoop'@'localhost' IDENTIFIED BY 'hadoop';
mysql> GRANT ALL PRIVILEGES ON *.* TO 'hadoop'@'localhost' WITH
GRANT OPTION;
mysql> exit;
7. su – userName // In my case userName is hadoop. So, switch to that user.
8. mysql -h localhost -u userName -p.
9. Now, we need to change the hive configuration so it can use MySQL:
10. Go to hive/conf directory, then open hive-default.xml and perform
following changes.
<!-- In local metastore setup, each Hive Client will open a connection to
the datastore and make SQL queries against it.-->
<property>
<name>hive.metastore.local</name>
<value>true</value>
</property>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>
jdbc:mysql://localhost:3306/hive?createDatabaseIfNotExist=true
</value>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>Username</value><!-- In my case UserName is hadoop-->
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>UserPassword</value><!-- In my case password is hadoop-->
</property>
11. Hive needs to have the MySQL jdbc drivers, so we need to download and
12. If all settings are done correctly, we can do this:-
copy it to hive/lib folder.
impetus@ubuntu:~$ cd HIVE_HOME
impetus@ubuntu:~/hive$ bin/hive
Hive history file=/tmp/ankit
/hive_job_log_ankit_201102211937_456962737.txt
hive>
hive>create table temp(info INT);
OK
Time taken: 4.745 seconds
hive> show tables;
OK
temp
Time taken: 0.137 seconds
hive>
13. Again, start MySQL server.
impetus@ubuntu:~$mysql -h localhost -u hadoop -p
Enter password:
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 84
Server version: 5.1.41-3ubuntu12.8 (Ubuntu)
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
mysql>
14. Use following command to view all existing databases.
mysql> show databases;
+--------------------+
| Database |
+--------------------+
| information_schema |
| hive |
| mysql |
+--------------------+
3 rows in set (0.05 sec)
mysql> use hive; //Select database.
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A
Database changed
mysql>
15. Use following command to view all exiting tables.
mysql> show tables;
+--------------------+
| Tables_in_hive |
+--------------------+
| BUCKETING_COLS |
| COLUMNS |
| DBS |
| NUCLEUS_TABLES |
| PARTITIONS |
| PARTITION_KEYS |
| PARTITION_KEY_VALS |
| PARTITION_PARAMS |
| SDS |
| SD_PARAMS |
| SEQUENCE_TABLE |
| SERDES |
| SERDE_PARAMS |
| SORT_COLS |
| TABLE_PARAMS |
| TBLS |
+--------------------+
16 rows in set (0.00 sec)
mysql>
16. Use following command to view the metadata information of hive table
columns.
mysql> select * from COLUMNS;
+-------+---------+-------------+-----------+-------------+
|SD_ID|COMMENT|COLUMN_NAME|TYPE_NAME|INTEGER_IDX|
+-------+---------+-------------+-----------+-------------+
| 1 | NULL | info | int | 0 |
+-------+---------+-------------+-----------+-------------+
1 rows in set (0.00 sec)
mysql>
//Where info is the column of temp table.
17. Similarly, you can View the information of partiotions, Bucketing,
tables, etc.