Hadoop Hive UDF

Rating: 5

7760

Hive UDF (User-Defined Functions)

Sometimes the query you want to write can’t be expressed easily using the built–in functions that HIVE provides.
By writing UDF (User Defined function) hive makes it easy to plug in your own processing code and invoke it from a Hive query.
UDF’s have to be writhen in Java, the Language that Hive itself is written in.

There are three types of UDF’s in Hive

1. UDF’s (regular)
2. UDF’s (user defined Aggregate Functions)
3. UDF’s (user defined table – generating Functions)

They differ in the number of rows in which they accept input and produces output.

1) UDF Operates on a single row and produces a single row as its output has most of the functions, such as mathematical functions.

2) UDAF’S:-

UDAF works on multiple input rows and creates a single output row and aggregate functions which include functions such as count and MAX.
A UDTF:-Operates on a single row and produces multiple rows- a table- as output.
Table–generating function are less well known than the other two types.

Ex:- Consider a table with a single column x which contains arrays of strings.

hive>CREATE TABLE arrays(*ARRAY DELIMITED FIELDS TERMANATED By’?01’Collection
ITEMS By’?02’;

After running a LOAD DATA Command, the following query confirms that the data was loaded correctly:

hive>SELECT * FROM arrays;

[“a”, ”b”]

[“c”, ”d” ,“e”]

Next, we can use the explode UDTF to transform this table
This function emits a row for each entry in the array.
So, in this case the type of the output column y is STRING.
The result is that the table is flattened into five rows:

Hive>SELECT explode(x)As y from arrays;

SELECT Statements using UDTFs have some restrictions such as not being able to retrieve additional column expressions.

Are you looking for Hadoop Hive Training? MindMajix is the right palce to get trainined. Lets Hurry!

Writing a Hive UDF:-

We can write a simple UDF by using characters from the ends of strings.
Hive already has a built- in function called, so we can call the strip
The code for the strip Java class is shown as below for stripping characters from the ends of strings

Package com. hadoop book .hive;
Import . org . apache. Common. Long. String URLS;
Import . org . apache. hadoop. Hive. ql. exec UDF;
Import . org . apache. hadoop. Io .text;
Public class strip extends UDF
{
Private Text result = new text();
Public. Text. evaluate(Text str)
{
If(str==null)
{
Return null;
}
Result. set(string utils. Strip(str. To string()));
Return result:
}
Public. Text. evaluate(Text str, string strip chers)
{
If(str==null)
{
Return null;
}
result. set(string utils. Strip(str. To string(),strip chars));
Return result;
}
}

A UDF must satisfy the following two properties:

1. A UDF must be a sub class of org. apache. Hadoop. Hive ql. exec. UDF
2. A UDF must implement at least one evaluate() method.

The strip class has two evaluate() methods. Which are not defined by an interface
The first strips leading and trailing white space from the input while the second strip has set of supplied characters from the ends of the string.

To use MB UDF in Hive, Run as JAVA Application and register the file with Hive:

hive>ADD JAR/path/to/Hive-examples.jar;

We also need to create an alias for the java class name:

Hive)CREATE TEMPORARY FUNCTION strip As ‘com-hadoop book. Hive. strip.;

To call ADD JAR, you can specify at launch time a path where Hive looks for auxiliary JAR files to put on its class path.
This technique is used for automatically adding your own library of UDFs for every time you hive.
There are two ways of specifying the path either by passing the –aux path option to the hive command as below:

%hive—aux path/path/to/Hive-examples jar

or by setting the HIVE-AUX-JARS-PATH environment variable before involving Hive.

The UDF is now ready to be used, just like a built-in function:

hive>SELECT EMPID, Strip(EMPNAME),ESAL FROM Employee;

(Or)

hive>SELECT strip(‘banana’, ’ab’)FROM dummy;

Output is: non

List of Big Data Courses:

Hadoop Adminstartion	MapReduce
Big Data On AWS	Informatica Big Data Integration
Bigdata Greenplum DBA	Informatica Big Data Edition
Hadoop Hive	Impala
Hadoop Testing	Apache Mahout

On-Job Support Service

Online Work Support for your on-job roles.

@Learner@SME

Our work-support plans provide precise options as per your project tasks. Whether you are a newbie or an experienced professional seeking assistance in completing project tasks, we are here with the following plans to meet your custom needs:

Pay Per Hour
Pay Per Week
Monthly

Learn MoreContact us

Course Schedule

Name	Dates
Hadoop Training	Apr 22 to May 07	View Details
Hadoop Training	Apr 26 to May 11	View Details
Hadoop Training	Apr 29 to May 14	View Details
Hadoop Training	May 03 to May 18	View Details

Last updated: 04 Apr 2023

About Author

Ravindra Savaram

Ravindra Savaram is a Technical Lead at Mindmajix.com. His passion lies in writing articles on the most popular IT platforms including Machine learning, DevOps, Data Science, Artificial Intelligence, RPA, Deep Learning, and so on. You can stay up to date on all these technologies by following him on LinkedIn and Twitter.

read less

Recommended Courses

Denodo Training

4.6

532

Elasticsearch Training

4.6

824

1 / 15

Hadoop Articles

Hadoop Quiz

Test and Explore your knowledge