Hive Basic: Hive Basic

Hive Introduction:-

Hive is a data warehouse infrastructure built on top of Hadoop that provides tools to enable easy data summarization, adhoc querying and analysis of large datasets data stored in Hadoop files.

The size of data sets being collected and analyzed in the industry for business intelligence is growing rapidly, making traditional warehousing solutions prohibitively expensive. Hadoop is a popular open-source map-reduce implementation which is being used in companies like Yahoo, Facebook etc. to store and process extremely large data sets on commodity hardware. However, the map-reduce programming model is very low level and requires developers to write custom programs which are hard to maintain and reuse. In this blog, we present Hive, an open-source data warehousing solution built on top of Hadoop. Hive supports queries expressed in a SQL-like declarative language - HiveQL, which are compiled into map- reduce jobs that are executed using Hadoop.

Hive Installation:-

1. Requirement:-

Java 1.6
Hadoop 0.17.x to 0.20.x.

2. Download:-http://www.apache.org/dyn/closer.cgi/hive/

3. You need to unpack the tarball.

$tar -xzvf hive-x.y.z.tar.gz

4. Set the environment variable HIVE_HOME to point to the installation directory

5. Hive uses hadoop that means:

you must have hadoop in your path OR
export HADOOP_HOME=<hadoop-install-dir>

6. In addition, you must create /tmp and /user/hive/warehouse and set them chmod g+w in HDFS before a table can be created in Hive.

$ $HADOOP_HOME/bin/hadoop fs –mkdir /tmp

$ $HADOOP_HOME/bin/hadoop fs –mkdir /user/hive/warehouse

$ $HADOOP_HOME/bin/hadoop fs -chmod g+w /tmp

$ $HADOOP_HOME/bin/hadoop fs -chmod g+w /user/hive/warehouse

Hive Command Line Interface:-

$ $HIVE_HOME/bin/hive

DDL Operation:-

Create Hive Table:-

1. hive> Create table info(name INT , add STRING);

2. hive>Create table data(name INT , add STRING) PARTITIONED BY (date STRING);

(Creates a table called data with two columns and a partition column called date .The partition column is a virtual column. It is not part of the data itself but is derived from the partition that a particular dataset is loaded into.)

Show table:-

1. hive>Show tables;

2. hive>Show tables ‘.*s’;

(List all the table end that end with s).

Describe table:-

1. hive>Describe info;

(Show list of all columns)

Alter table:-

1. hive>Alter table info ADD COLUMNS (PNO INT);

2. hive>Alter table info RENAME TO information;

Drop table:-

1. hive>drop table data;

DML Operation:-

Load data:-

1. hive>LOAD DATA LOCAL INPATH './examples/files/kv1.txt' OVERWRITE INTO TABLE info;

(Loads a file that contains two columns separated by ctrl-a into info table. 'local' signifies that the input file is on the local file system. If 'local' is omitted then it looks for the file in HDFS.)

2. hive>LOAD DATA LOCAL INPATH './examples/files/kv2.txt' OVERWRITE INTO TABLE data PARTITION (date='20-11-2010');

3. hive>LOAD DATA INPATH 'hdfs://localhost:9000/user/hive/warehouse/kv2.txt' OVERWRITE INTO TABLE data PARTITION (date='21-11-2010');

(The above command will load data from an HDFS file/directory to the table. Note that loading data from HDFS will result in moving the file/directory.)

SQL Operation:-

Select:-

1. Select * from info;

2. SELECT a.name FROM info a WHERE (a.date='20-11-2010');

Count:-

1. SELECT COUNT(*) FROM info;

Creating custom UDFs:-

1. Create a new class that extends UDF, with one or more methods named evaluate.

import org.apache.hadoop.hive.ql.exec.UDF;

import org.apache.hadoop.io.Text;

public final class Lower extends UDF

{

public Text evaluate(final Text s)

{

if (s == null) { return null; }

return new Text(s.toString().toLowerCase());

}

(Note that there's already a built-in function for this, it's just an easy example).

2. Now, Compile the function and make jar.

3. Deploying jars for User Defined Functions

hive> add jar my_jar.jar;

Added my_jar.jar to class path

4. Once hive is started up with your jars in the classpath, the final step is to register your function:

create temporary function my_lower as ‘Lower’;

5 . Now you can start using it.

hive> select my_lower(title), sum(freq) from titles group by my_lower(title);

11 comments:

Rakesh Kumar RakshitJanuary 7, 2011 at 4:46 AM
Hi I tried running the hive but got error as follows:

*************************************************
rakesh@ubuntu:~/Downloads/hive-0.6.0-bin$ bin/hive
Invalid maximum heap size: -Xmx4096m
The specified size exceeds the maximum representable size.
Could not create the Java virtual machine.
*************************************************
Where to mention the heap size and how much?
Do you have any idea about this error?
Ankit JainFebruary 13, 2011 at 11:22 PM
Hi Rakesh , This problem is generally occure in hive-0.6.0 version.
Solution:
1. Go to $HIVE_HOME/bin/ext/util directory and open execHiveCmd.sh .
2. Replace HADOOP_HEAPSIZE=4096 with HADOOP_HEAPSIZE=1024
UnknownMarch 8, 2012 at 7:26 PM
Hi Ankit, I have a file where there are 2 columns with a tab space in between. When I load the file, it loads both the columns from the file into the same column in hive. How do I split them in 2 column in hive?

Thanks a lot!
Ankit JainMarch 9, 2012 at 2:10 AM
Hi,

Hive by default use ctrl-A as a field delimiter.
You need to specify one extra parameter FIELDS TERMINATED BY '\t' in create table statement.

Example:

CREATE TABLE leagues (
foo_id STRING,
foo_name STRING
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t';
avinashyeluriJune 22, 2012 at 7:58 AM
Is there any way to add two or three columns to one column in hive
thanks
UnknownJuly 3, 2012 at 11:26 PM
Hi Ankit,

can we insert values into hive tables as like we do in SQL (Insert into table values (1 , 'joe') )

Does it provide by HQL, If yes then please give an example

Thanks
Ankit JainJuly 4, 2012 at 6:06 AM
Hi Yogesh,

We can't insert the row into Hive table.
AshokAugust 30, 2013 at 8:16 AM
hi ankit when iam creating atable in it is saying like this

"hive> Create table info(name INT , add STRING);
FAILED: Parse Error: line 1:29 mismatched input 'add' expecting Identifier near ',' in column specification"
what is the problem
Ankit JainAugust 30, 2013 at 8:19 AM
Hi Ashok,
add is a special keyword in hive, please use different column name.
UnknownDecember 26, 2015 at 10:51 PM
Hi Ankit,

Is it possible for you to spare some time and give 3-4 sessions on hadoop . I am looking for a industry person who can assist me in some doubts . I wil be greatfull to you and ready to pay for the classes. My email is jaspalsingh.mainframes@gmail.com and number is 8860007958.Thanks
UnknownSeptember 12, 2016 at 4:48 AM
its unique..nice blog for hive..thanks for sharing..

http://bit.ly/2cSVJxm

Sunday, November 21, 2010

Hive Basic

11 comments: