ETL Task Custom Sources, Consumers and Converters
ETL Task supports numerous sources, consumers, and converters such as Excel, dBase, text files with XML, and a variety of repository objects.
What to do if you need to use data formats that are not supported by the platform or transform data in some specific way? Here custom sources, custom consumers and custom converters come to your rescue.
The use of ETL requires no programming skills; everything is done using the user interface of the platform. However Fore/Fore.NET programming skills will be required for handling custom objects.
Let’s create a custom source. The figure below illustrates how the wizard for creating a new ETL object looks like:
If you read the message across the top of the window, you’ll see that you need to implement either of two interfaces: IDtRecordsetProvider or IDtCustomProvider.
The easiest way is to implement the IDtRecordsetProvider interface. This interface provides only one method that takes no parameters and returns a two-dimensional array.
Despite many other pros of such implementation, there are some cons. :)
First, the code can be written only in Fore, Fore.NET is not used. :(
Second, you cannot manage filed types and names for a data source. Field names are given by the platform core and look like FIELD1, FIELD2, and so on. Data types are determined by the core automatically through analysis of the content of the first few records in the source.
Third, you cannot manage data acquisition algorithm and manage memory; all data return as an entire array as a result of one call; and so on and so forth.
Below is the complete implementation of IDtRecordsetProvider. As you can see, the implementation is very simple in deed. In our example, it returns a two-dimensional array. Rows return in the first column of the array, real numbers return in the second column.
Public Class UserProvider : Object, IDtRecordsetProvider Public Function Fetch : Array Of Variant; Var Data: Array[0..1, 0..1] Of Variant; Begin Data[0, 0] := "Строка 1"; Data[0, 1] := "Строка 2"; Data[1, 0] := 1.0; Data[1, 1] := 2.0; Return Data; End Function Fetch; End Class UserProvider;
If you are not happy with these limitations, then use IDtCustomProvider. This method is far more sophisticated than the first one and supports a far larger number of methods. Using its interface, you can write in Fore.NET. This means you can leverage Microsoft.NET at full blast, specify your data names and types for fields, manage the process of data delivery to the customer, manage memory consumption, and so on.
Note. For more detained information on the Fore.NET language, refer to the Online Documentation. You are strongly recommended to have a look at the documentation.
Here I won’t provide the complete implementation of IDtCustomProvider, since it’s too long. Minimal code is given in the test repository. You can find information on how to use the test repository at the end of my article.
Now, you’ve learned how to deal with data sources. Let’s see what you need to do to implement custom consumers. As you might guess, you need to write implementation for either IDtRecordsetConsumer or IDtCustomConsumer interface. :)
Pros and cons of both consumer implementations are similar to those described earlier for source implementations. The only difference is a set of methods accessible form their interfaces.
Below is the implementation of a custom consumer. As you can see, It’s little more sophisticated than the one used for sources.
Public Class UserConsumer : Object, IDtRecordsetConsumer Public Sub Put(Data: Array Of Variant); Var i,j : Integer; Begin For i := Data.GetLowerBound(2) To Data.GetUpperBound(2) Do For j := Data.GetLowerBound(1) To Data.GetUpperBound(1) Do Debug.WriteLine(Data[j, i]); End For; End For; End Sub Put; Public Sub Clear; Begin End Sub Clear; End Class UserConsumer;
The example of the IDtCustomConsumer implementation is even longer and is provided in the test repository as well.
We are done with data sources and consumers. Let’s consider how to implement your own data transformation mechanism. What needs to be done for that? Look at the wizard window again. It contains a tip, which requests you to specify a certain macros from a certain module.
Let’s see, what kind of macros is needed and how to write it. There are three ways to achieve that:
- Write a static member procedure of any class (Shared in terms of the Fore/Fore.NET languages)
- Write a global procedure (it is true only for Fore, since Fore.NET has no global procedures)
- Write a non-static member procedure of the Transform class.
In any case, the procedure should take two parameters: input and output data sets of the IEtlPlainRecordSets type.
In our example, the UserTransform global procedure is selected.
Below is the example of your own data transformer implementation. In our example, the transformer adds up data in the first fields of two data sources. The remaining fields of the first source are passed to output without any changes, fields of the source are not used.
To simplify the logic, in our example, it is assumed that both sources have the same number of records. In reality, everything is not that simple.
Public Sub UserTransform(Input, Output: IEtlPlainRecordSets); Var RecordSetIn1, RecordSetIn2, RecordSetOut: IEtlPlainRecordSet; RecordOut: IEtlPlainRecord; RecordIn1, RecordIn2: IEtlPlainRecord; i, j: Integer; Begin RecordSetIn1 := Input.Item(0); // Вход для 1-го источника RecordSetIn2 := Input.Item(1); // Вход для 2-го источника RecordSetOut := Output.Item(0); // Выход на приёмник For i := 0 To RecordSetIn1.Count - 1 Do RecordSetOut.Add; RecordOut := RecordSetOut.Item(i); RecordIn1 := RecordSetIn1.Item(i); RecordIn2 := RecordSetIn2.Item(i); For j := 0 To RecordIn1.Count - 1 Do If j = 0 Then RecordOut.Value(j) := "S1:" + RecordIn1.Value(j) + " S2:" + RecordIn2.Value(j); Else RecordOut.Value(j) := RecordIn1.Value(j); End If; End For; End For; End Sub UserTransform;
Able to have multiple inputs and outputs, the transformer allows combining data between sources and consumers in different ways. In reality, its use is limited only by the developer’s imagination and task requirements.
So, if we gather together all objects that we’ve implemented, we’ll get the following ETL task.
This task has the same threads with absolutely the same sources, while transformers and consumers are written differently and even in different languages.
The task has no sense in practical terms. But it illustrates the ease of implementing objects for using user data and algorithms in the Prognoz Platform.
Useful information links to probe further
To avoid too many technical details in my article, I’ve placed a test repository (SQLite DBMS) in our cloud. This repository contains all objects used to write this article. For those interested, you can download it to browse and use it as desired.
The repository is pretty simple. It has objects in Fore (OBJ63) and in Fore.NET (OBJ56) along with the ETL task that uses these objects (OBJ69).
To run it, you’ll need a copy of the Prognoz Platform. To download a personal version, go to our Web site. Note that this version always opens a demo repository. To open my repository, first open the Navigator, then select Open Repository. For guidelines how to connect the existing repository, refer to the Online Documentation for the Prognoz Platform.
To be continued
That’s all I wanted to tell you about in this article. I hope you find my article interesting and informative.
The ETL topic is not exhausted. Based on custom objects covered in this article, you can create full-fledged ETL objects using your own user interface, settings, and possibility to save/restore its state. I’ll share how to do that in my next article.