Beam SQL 擴充功能:使用者定義函數
如果 Beam SQL 沒有符合您需求的純量函數或聚合函數,則可以在 Java 中編寫並在您的 SQL 查詢中調用它們。這些通常稱為 UDF(用於純量函數)和 UDAF(用於聚合函數)。
建立和指定使用者定義函數 (UDF)
UDF 可以是以下之一
- 任何接受零個或多個純量欄位並傳回一個純量值的 Java 方法。
- 一個
SerializableFunction
。
以下是 UDF 的範例以及如何在 DSL 中使用它
/**
* A example UDF for test.
*/
public static class CubicInteger implements BeamSqlUdf {
public static Integer eval(Integer input){
return input * input * input;
}
}
/**
* Another example UDF with {@link SerializableFunction}.
*/
public static class CubicIntegerFn implements SerializableFunction<Integer, Integer> {
@Override
public Integer apply(Integer input) {
return input * input * input;
}
}
// Define a SQL query which calls the above UDFs
String sql =
"SELECT f_int, cubic1(f_int), cubic2(f_int)"
+ "FROM PCOLLECTION "
+ "WHERE f_int = 2";
// Create and apply the PTransform representing the query.
// Register the UDFs used in the query by calling '.registerUdf()' with
// either a class which implements BeamSqlUdf or with
// an instance of the SerializableFunction;
PCollection<Row> result =
input.apply(
"udfExample",
SqlTransform
.query(sql)
.registerUdf("cubic1", CubicInteger.class)
.registerUdf("cubic2", new CubicIntegerFn())
建立和指定使用者定義聚合函數 (UDAF)
Beam SQL 可以接受 CombineFn
作為 UDAF。註冊方式與上述 UDF 範例類似
/**
* UDAF(CombineFn) for test, which returns the sum of square.
*/
public static class SquareSum extends CombineFn<Integer, Integer, Integer> {
@Override
public Integer createAccumulator() {
return 0;
}
@Override
public Integer addInput(Integer accumulator, Integer input) {
return accumulator + input * input;
}
@Override
public Integer mergeAccumulators(Iterable<Integer> accumulators) {
int v = 0;
Iterator<Integer> ite = accumulators.iterator();
while (ite.hasNext()) {
v += ite.next();
}
return v;
}
@Override
public Integer extractOutput(Integer accumulator) {
return accumulator;
}
}
// Define a SQL query which calls the above UDAF
String sql =
"SELECT f_int1, squaresum(f_int2) "
+ "FROM PCOLLECTION "
+ "GROUP BY f_int1";
// Create and apply the PTransform representing the query.
// Register the UDAFs used in the query by calling '.registerUdaf()' by
// providing it an instance of the CombineFn
PCollection<Row> result =
input.apply(
"udafExample",
SqlTransform
.query(sql)
.registerUdaf("squaresum", new SquareSum()));