ETL Task Custom Objects
In my previous article, I shared how to write your own ETL task objects. Oftentimes, we face unforeseeable conditions when tacking application tasks. In this context, it makes sense that we have the possibility to fine tune ETL task objects.
Luckily, the Prognoz Platform provides such possibility. I this article, I’ll tell you how it works and how to integrate your objects into the system user interface.
For illustration purposes, I’ll use a real-world case. It is simplified and adapted to demonstrate how the Prognoz Platform is used in day-to-day operations (truth be told, our platform is widely used for automating day-to-day operations).
We have a set of XML files containing results of weekly source code analysis for each week. Our task is to import these files into a table, build a standard cube on top of the table, construct a report based on the cube, and report on the code status to managers using visual time schedules.
Initially, we have an XML file, a small portion of which is provided below:
Finally, we get the following table for the cube:
Table fields are linked to XML file records as follows: the Project field corresponds to the Project line, the Analyzer field corresponds to the ErrorCode line, the Calendar field is linked to the ModificationDate line, and the Error Level field is linked to the Level line. The Calendar Type field defines intervals and is the same for all records. In fact, it’s just a calculable table field.
The Project, Analyzer, Calendar, and Calendar Type fields are linked to dictionaries and form dimensions. The Fact field presents cube facts and is a calculable field.
The fact that all file data are presented as strings is a major challenge. More specifically, projects are given by names, while errors are given by alphanumeric abbreviations. Cubes utilize dictionaries links to which are built on numeric keys. Moreover, the file contains data for analysis. These data should be also read from the file or given explicitly via the user interface.
To transform file strings correctly and to handle dates, I’m going to use custom objects.
To keep it simple without going into technical details, herein I’ve provided a test repository (Sqlite DBMS). You can find guidelines on how to install a personal version of the Prognoz Platform and how to set up the repository in my first article. All objects are located in Code Analysis folder and this time the entire code is written in Fore. This folder contains predefined dictionaries, the cube, its table, application modules and forms, and two ETL tasks. The data source (XML file) is also stored as a Document object in the repository (in the Source Data subfolder) to save users time and avoid extra downloads and settings.
How it’s done
In what follows, I will describe how to implement a custom source and a converter. Minimal implementation of objects was described in my first article. Here, I’ll focus on configuring custom objects via the user interface.
When it comes to a custom source, all transformations are performed in the source object. All you need to do is to select the desired Document-type repository object. Let’s use a standard consumer — Repository Object – Standard Cube.
For a custom converter, let’s use a standard source (XML file) and consumer (Repository Object – Standard Cube). In this case, we cannot take a date from anywhere. So, I’ve added a setting for specifying the desired date manually.
Note. It would be a lot easier to add one more source with the proper XPath query and then integrate data from two sources. In this example, it’s more important for us to show how you can configure the custom converter.
You might remember in my first article I described custom objects. To enable user settings, support of one more interface needs to be added: IEtlForeWizardPages. This interface is very simple. First, it must return the number of custom pages displayed in system wizards. Second, it must return links to forms that implement respective pages.
For any object it will look as follows:
Public Class PVSAnalyzer : Object, IDtCustomProvider (or any other custom object), IEtlForeWizardPages // IDtCustomProvider (or any other custom object) // … // IEtlForeWizardPages // Function Item(index: Integer): IMetabaseObjectDescriptor; Begin Select Case index Case 0: Return Metabase.ItemById("ANALYZER_PROVIDER_PAGE"); End Select; End Function Item; Function get_Count: Integer; Begin Return 1; End Function get_Count; End Class PVSAnalyzer;
Now we need to implement pages that return from the Item method. In this case, we have only one page.
To do so, we need to implement a form class and add interface implementation IEtlForeWizardPage to it.
For a source, the page in the development environment will look as follows:
Its corresponding code will be as follows:
Class ANALYZER_PROVIDER_PAGEForm : Form, IEtlForeWizardPage Label1: Label; EditBox1: EditBox; Button1: Button; MetabaseOpenDialog1: MetabaseOpenDialog; Model: IEtlUserDataContainer; Public Function OnSetActive: Boolean; Begin SetModelData; Return True; End Function OnSetActive; Public Function OnWizardNext: Integer; Begin If (Model <> Null) And (Model.UserData <> Null) Then Model.UserData.setAttribute("FileName", EditBox1.Text); End If; Return 0; End Function OnWizardNext; Public Function OnWizardBack: Integer; Begin If (Model <> Null) And (Model.UserData <> Null) Then Model.UserData.setAttribute("FileName", EditBox1.Text); End If; Return 0; End Function OnWizardBack; Public Function IsLeaveAllowed: Boolean; Begin Return True; End Function IsLeaveAllowed; Public Function IsDoneAllowed: Boolean; Begin Return False; End Function IsDoneAllowed; Public Sub set_Model(m: IEtlUserDataContainer); Begin Model := m; SetModelData; End Sub set_Model; Public Function get_Title: String; Begin Return "Выбор папки с файлами"; End Function get_Title; Public Sub set_Title(t: String); Begin End Sub set_Title; Sub SetModelData; Var DataR: variant; Begin If (Model <> Null) And (Model.UserData <> Null) Then DataR := Model.UserData.getAttribute("FileName"); If Not Model.UserData.getAttribute("FileName").IsEmpty And (DataR <> Null) Then EditBox1.Text := DataR As String; End If; End If; End Sub SetModelData; Sub Button1OnClick(Sender: Object; Args: IMouseEventArgs); Begin If MetabaseOpenDialog1.Execute(Self) Then EditBox1.Text := MetabaseOpenDialog1.Object.Id; End If; End Sub Button1OnClick; End Class ANALYZER_PROVIDER_PAGEForm;
It the ETL Task interface, it will look as follows:
1. Custom source
2. Custom converter
As you can see, there’s nothing too special. :) Though the code is long enough.
Our code consists of two parts: implementation of the object’s logic and implementation of its visible part. These two parts know nothing about each other and interact exclusively via serialized views. To make sure that everything works smoothly and correctly, load/save code needs to be implemented properly. These are Load/Save and set_ModelData/OnWizardNext methods of the custom object and form respectively.
The Load method loads previously saved state of the object. The Save method saves respectively. To implement your objects, simply take a ready code from the example and adjust it for your needs. This is because of strictly defined structure of the XML element demanding that any user data manipulations be inside the UserData element.
The set_ModelData method is used to restore user interface state correctly. The OnWizardNext method saves user changed fields to a serialized view. This method is called by the core by clicking the Next and Finish buttons.
What is the final result?
I hope you find the above information useful and insightful. My description, however, would be incomplete without demonstration of the final result.
As a result of the ETL Task, in the Code Analysis folder (you can run either of tasks, the result will be the same) you get (for data analysis, I used the OLAP tool) the following:
In a table form it looks as follows:
As a chart it looks as follows:
This is not the whole story
Saving the best for last, I’ll show one more perk of the Prognoz Platform — a custom object embeddable to the list of standard sources/consumers/converters. It is used to embed own objects into any ETL task or multiple objects in one task. You select the respective module and class once to define a template. Thereafter, you work with the predefined template.
You can add your template through the ETL Task menu. For sources, the steps are as follows:
In the dialog box that appears, you can create your own data source.
After you complete all fields and click OK, a new custom source template is added and immediately appears on the list of available sources. What is particularly pleasant is that it is added to all tasks of the active repository. When the repository is opened on another machine, you need to refresh objects using the Navigator. The ETL Task will look as follows:
Now, you can add the Analyzer Files source similarly to all other sources.
For custom consumers and converters, procedures are the same as those described above.
Now this is the whole story
This completes the topic of custom objects in the ETL task. Hopefully, the series of my articles are interesting and useful.
If you have any questions or suggestions on this particular article, please contact me, and I promise to respond to them.