The original blog post comes from : http://blog.csdn.net/liuj2511981/article/details/8523084 thank !

Hive Conduct UDF Development is very simple , As mentioned here UDF by Temporary Of function, So we need to hive Version in 0.4.0 That's all .

One 、 background

Hive Is based on Hadoop Medium MapReduce, Provide HQL Query data warehouse .Hive It's a very open system , A lot of content supports customization , Include :

a) File format :Text File,Sequence File
b) Data formats in memory : JavaInteger/String, Hadoop IntWritable/Text
c) user-provided map/reduce Script : No matter what language , utilize stdin/stdout To transmit data
d) User defined functions : Substr, Trim, 1 – 1
e) User defined aggregate function : Sum, Average…… n – 1

2、 Definition :UDF(User-Defined-Function), User defined functions process data .

Two 、 usage

1、UDF Functions can be applied directly to select sentence , After formatting the query structure , And then output the content .
2、 To write UDF We need to pay attention to a few points when using functions :
a) Customize UDF Need to inherit org.apache.hadoop.hive.ql.UDF.
b) Need to achieve evaluate Letter .
c)evaluate Functions support overloading .

3、 Here is the sum function of two numbers UDF.evaluate The function represents the addition of two integer data , Add two floating-point data , Add variable length data

Hive Of UDF Development just needs to refactor UDF Class evaluate Function . example :

  1. package hive.connect;
  2. import org.apache.hadoop.hive.ql.exec.UDF;
  3. public final class Add extends UDF {
  4. public Integer evaluate(Integer a, Integer b) {
  5. if (null == a || null == b) {
  6. return null;
  7. } return a + b;
  8. }
  9. public Double evaluate(Double a, Double b) {
  10. if (a == null || b == null)
  11. return null;
  12. return a + b;
  13. }
  14. public Integer evaluate(Integer... a) {
  15. int total = 0;
  16. for (int i = 0; i < a.length; i++)
  17. if (a[i] != null)
  18. total += a[i];
  19. return total;
  20. }
  21. }

4、 step

a) Package the program and put it on the target machine ;

b) Get into hive client , add to jar package :hive>add jar /run/jar/udf_test.jar;

c) Create temporary functions :hive>CREATE TEMPORARY FUNCTION add_example AS 'hive.udf.Add';

d) Inquire about HQL sentence :

SELECT add_example(8, 9) FROM scores;

SELECT add_example(scores.math, scores.art) FROM scores;

SELECT add_example(6, 7, 8, 6.8) FROM scores;

e) Destroy temporary functions :hive> DROP TEMPORARY FUNCTION add_example;

5、 Details in use UDF When , Automatic type conversion , for example :

SELECT add_example(8,9.1) FROM scores;

notes :

1. UDF Only one in one out operation can be realized , If you need to achieve more in and out , You need to achieve UDAF

Let's take a look UDAF

( Two )、UDAF

1、Hive When querying data , Some clustering functions are in HQL I didn't bring it with me , User defined implementation is required .

2、 User defined aggregate function : Sum, Average…… n – 1

UDAF(User- Defined Aggregation Funcation)

One 、 usage

1、 Two bags are required import org.apache.hadoop.hive.ql.exec.UDAF and org.apache.hadoop.hive.ql.exec.UDAFEvaluator.
2、 Function classes need to inherit UDAF class , Inner class Evaluator real UDAFEvaluator Interface .
3、Evaluator Need to achieve init、iterate、terminatePartial、merge、terminate These functions .
a)init Function implementation interface UDAFEvaluator Of init function .
b)iterate Receive incoming parameters , And rotate inside . Its return type is boolean.
c)terminatePartial No parameter , It as a iterate At the end of the function rotation , Return rotation data ,terminatePartial Be similar to hadoop Of Combiner.
d)merge receive terminatePartial Return result of , Do data merge operation , Its return type is boolean.
e)terminate Returns the final aggregate function result .

    1. package hive.udaf;
    2. import org.apache.hadoop.hive.ql.exec.UDAF;
    3. import org.apache.hadoop.hive.ql.exec.UDAFEvaluator;
    4. public class Avg extends UDAF {
    5.         public static class AvgState {
    6.        private long mCount;
    7.         private double mSum;
    8. }  
    9. public static class AvgEvaluator implements UDAFEvaluator {
    10.       AvgState state;
    11.  public AvgEvaluator() {
    12.         super();
    13.          state = new AvgState();
    14.           init();
    15. }  
    16. /** *  init Functions are similar to constructors , be used for UDAF The initialization  */
    17. public void init() {
    18.    state.mSum = 0;
    19.   state.mCount = 0;
    20. }  
    21. /** *  iterate Receive incoming parameters , And rotate inside . Its return type is boolean * * @param o * @return */
    22. public boolean iterate(Double o) {
    23.  if (o != null) {
    24.           state.mSum += o;
    25.           state.mCount++;
    26.    } return true;
    27. }  
    28. /** *  terminatePartial No parameter , It as a iterate At the end of the function rotation , Return rotation data , * terminatePartial Be similar to hadoop Of Combiner * * @return */
    29. public AvgState terminatePartial() {
    30.    // combiner
    31.    return state.mCount == 0 ? null : state;
    32. }  
    33. /** *  merge receive terminatePartial Return result of , Do data merge operation , Its return type is boolean * * @param o * @return */
    34. public boolean terminatePartial(Double o) {
    35.     if (o != null) {
    36.              state.mCount += o.mCount;
    37.             state.mSum += o.mSum;
    38.     }
    39.      return true;
    40. }  
    41. /** *  terminate Returns the final aggregate function result  * * @return */
    42. public Double terminate() {
    43.     return state.mCount == 0 ? null : Double.valueOf(state.mSum / state.mCount);
    44. }  
    45. }  

5、 Perform the steps of the averaging function
a) take java File compiled into Avg_test.jar.
b) Get into hive Client add jar package :
hive>add jar /run/jar/Avg_test.jar.
c) Create temporary functions :
hive>create temporary function avg_test 'hive.udaf.Avg';
d) Query statement :
hive>select avg_test(scores.math) from scores;
e) Destroy temporary functions :
hive>drop temporary function avg_test;

5、 ... and 、 summary

1、 heavy load evaluate function .
2、UDF The parameter type in the function can be Writable, Or for Java Basic data objects in .
3、UDF Support variable length parameters .
4、Hive Supports implicit type conversion .
5、 When the client exits , Temporary functions created are automatically destroyed .
6、evaluate Function must return a type value , If it's empty, return to null, Not for void type .
7、UDF It is a calculation operation based on the column of a single record , and UDFA Is a user-defined clustering function , It is a calculation operation based on all records of the table .
8、UDF and UDAF Can be overloaded .
9、 View functions
SHOW FUNCTIONS;

 UDTF

1. UDTF Introduce
UDTF(User-Defined Table-Generating Functions) Used to solve Input one line, output multiple lines (On-to-many maping) The needs of .
2. Write what you need UDTF
(1) Inherit org.apache.hadoop.hive.ql.udf.generic.GenericUDTF.

 (2) Realization initialize, process, close Three methods .

UDTF The first call initialize Method , This method returns UDTF The return line information of ( Return the number , type ). Once the initialization is complete , Would call process Method , Process the incoming parameters , Can pass forword() Method returns the result . Last close() Method call , Clean up the methods that need to be cleaned up .
Here's one I wrote for segmentation ”key:value;key:value;” This string , The return result is key, value Two fields . For reference :

    1. import java.util.ArrayList;
    2.  import org.apache.hadoop.hive.ql.udf.generic.GenericUDTF;
    3.   import org.apache.hadoop.hive.ql.exec.UDFArgumentException;
    4.  import org.apache.hadoop.hive.ql.exec.UDFArgumentLengthException;
    5.  import org.apache.hadoop.hive.ql.metadata.HiveException;
    6.  import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector;
    7.  import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorFactory;
    8.  import org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector;
    9.  import org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorFactory;
    10. public class ExplodeMap extends GenericUDTF{
    11.   @Override
    12. public void close() throws HiveException {
    13.      // TODO Auto-generated method stub
    14.  }
    15.  @Override
    16.  public StructObjectInspector initialize(ObjectInspector[] args)
    17.         throws UDFArgumentException {
    18.    if (args.length != 1) {
    19.       throw new UDFArgumentLengthException("ExplodeMap takes only one argument");
    20.   }
    21.   if (args[0].getCategory() != ObjectInspector.Category.PRIMITIVE) {
    22.       throw new UDFArgumentException("ExplodeMap takes string as a parameter");
    23.    }
    24.   ArrayList<String> fieldNames = new ArrayList<String>();
    25.    ArrayList<ObjectInspector> fieldOIs = new ArrayList<ObjectInspector>();
    26.    fieldNames.add("col1");
    27.   fieldOIs.add(PrimitiveObjectInspectorFactory.javaStringObjectInspector);
    28.   fieldNames.add("col2");
    29.    fieldOIs.add(PrimitiveObjectInspectorFactory.javaStringObjectInspector);
    30.      return ObjectInspectorFactory.getStandardStructObjectInspector(fieldNames,fieldOIs);
    31.   }
    32.  @Override
    33.  public void process(Object[] args) throws HiveException {
    34.      String input = args[0].toString();
    35.             String[] test = input.split(";");
    36.    for(int i=0; i<test.length; i++) {
    37.         try {
    38.             String[] result = test[i].split(":");
    39.             forward(result);
    40.        } catch (Exception e) {
    41.            continue;
    42.        }
    43.    }
    44.   }
    45. }

 

3. Usage method

UDTF There are two ways to use it , A way to put it directly into select Back , One and lateral view Use it together .

1: direct select Use in

select explode_map(properties) as (col1,col2) from src;

You can't add other fields to use

select a, explode_map(properties) as (col1,col2) from src

You can't nest calls

select explode_map(explode_map(properties)) from src

Not with group by/cluster by/distribute by/sort by Use it together

select explode_map(properties) as (col1,col2) from src group by col1, col2

2: and lateral view Use it together

select src.id, mytable.col1, mytable.col2 from src lateral view explode_map(properties) mytable as col1, col2;

This method is more convenient for daily use . The execution process is equivalent to two separate extractions , then union In a watch .

【 turn 】hive in UDF、UDAF and UDTF Use more related articles

  1. hive in udf,udaf,udtf

    1.hive Basic operations in : DDL,DML 2.hive Middle function User-Defined Functions : UDF( User defined functions , abbreviation JDF function )UDF: One in, one out  upper  lower ...

  2. hive in UDF、UDAF and UDTF Use

    Hive Conduct UDF Development is very simple , As mentioned here UDF by Temporary Of function, So we need to hive Version in 0.4.0 That's all . One . background :Hive Is based on Hadoop Medium MapReduce, Provide HQ ...

  3. stay hive in UDF and UDAF Instructions

    Hive Conduct UDF Development is very simple , As mentioned here UDF by Temporary Of function, So we need to hive Version in 0.4.0 That's all . One . background :Hive Is based on Hadoop Medium MapReduce, Provide HQ ...

  4. Hive Medium UDF Detailed explanation

    hive As a sql Query engine , It comes with some basic functions , such as count( Count ),sum( Sum up ), Sometimes these basic functions can't meet our needs , It's time to write hive hdf(user defined funati ...

  5. Hive 10、Hive Of UDF、UDAF、UDTF

    Hive There are three kinds of custom functions UDF.UDAF.UDTF UDF(User-Defined-Function)  One in, one out UDAF(User- Defined Aggregation Funcation) ...

  6. [ turn ]HIVE UDF/UDAF/UDTF Of Map Reduce Code framework template

    FROM : http://hugh-wangp.iteye.com/blog/1472371 When I write my own code, I use the template   UDF step : 1. Must inherit org.apache.hadoop.hive ...

  7. HIVE Functional UDF、UDAF、UDTF

    One . Analysis of word meaning UDF(User-Defined-Function) One in, one out UDAF(User- Defined Aggregation Funcation) More in one out ( Aggregate functions ,MR) UDTF ...

  8. Hive Custom function UDF UDAF UDTF

    1.UDF: User defined ( Ordinary ) function , It works only on single row values : Inherit UDF class , Add method  evaluate() /** * @function Customize UDF Statistical minimum * @author John * */ ...

  9. 【 turn 】HIVE UDF UDAF UDTF difference Use

    The original blog post comes from :http://blog.csdn.net/longzilong216/article/details/23921235( temporary ) thank ! When I write my own code, I use the template   UDF step : 1 ...

Random recommendation

  1. Program the ape , Don't say you don't understand Docker!

    Two years ago , You don't know Docker as excusable . But if you still say that now , sorry , I can only say that you OUT 了 . You'd better... Right now get get up , Because it is possible that your company will introduce Docker. Today I'd like to talk about this highly praised application ...

  2. uninstall VS2012

  3. requirejs Learning blog address sharing

    1. http://blog.jobbole.com/30046/ 2. http://www.requirejs.cn/ 3. http://www.ruanyifeng.com/blog/2012 ...

  4. from http://blog.sina.com.cn/daylive——C++ STL map

    Map yes c++ A standard container for , She offers a great one-on-one relationship , Build a... In some programs map Can achieve twice the result with half the effort , Sum up some map Basic simple and practical operation ! 1.map The most basic constructor : map<string ...

  5. c# Use in log4net Tool logging

    First , Go to the official website to download log4net Tools link http://logging.apache.org/log4net/download_log4net.cgi Current version  log4net-1.2.15-bi ...

  6. Spring New download address

    Spring After the revision of the official website, I searched for it for a long time, but I didn't find it to download directly Jar Links to packages , Here's a summary of the methods available online , Close test available . 1. Enter address directly , Just change the corresponding version :http://repo.springsource.org/lib ...

  7. git Simple commands for

    git pull Pull down the latest version . git add . git commit Enter the submission information esc return :wq preservation git push .git Keep the current warehouse information .git bash here When , Be sure ...

  8. CentOS7 Next Docker Build... That can be automatically published to a project Tomcat Containers

    step Download mirroring Search the corresponding image file :docker search 'tomcat' as follows Download mirroring :docker pull tomcat:7, Here's the picture PS: The following numbers represent tomcat Version of , You can choose ...

  9. modify Semantic UI Chinese vs Google Font references

    At the first attempt Semantic UI after , Found that the css First line in , I quoted fonts.googleapis.com Font in . I don't know why , Maybe abroad ,google Our services are already interconnected ...

  10. Linux The server --Iptables Port forwarding

    daily Iptables Port forwarding demand : The company is Lan , Through an extranet ip, Access to the Internet . The company's cloud platform server is in the public network , There is an internal server in the virtualization platform , For use in a department of the company , Run above www service ,ssh End ...