Skip to content

Getting started tutorial

In this tutorial, will see to create your first ETL process with Etlang, and we will go through some essential language constructs. We assume you have a working IDE, you can get one working with the installation page.

Step 1: Create a new solution

First, we need to create a new project. Follow the same steps as in the video, or the textual description.

Warning

Videos were done with a previous version of the language. Mutustru has been fully rebranded to Etlang.

Use the New project wizard, give it a name, and make sure to keep Solution project selected.

Once your new solution is created, you can create a model. Look at the Logical View (left panel), with the right-click menu select New > Model. A window opens, and in the Used languages tab, add Etlang.

Info

Models declare which languages they use and hold a logical set of ETL processes also called Macros.

From there, you can create a new Macro, which defines your ETL process. Right-click on your model, and select New > Macro. In the main windows, you can give your Macro a name, for example, MyEtlMacro.

Step 2: Declare your file structure

Etlang is designed to handle tabular data, like CSV files, with a fixed number of columns.

A table for Etlang has a name, and a set of columns. Each column of a table has a unique name and contains a predefined type of value, for example, number, text, date, etc.

Here is our example table, with 5 columns:

firstname lastname age favouritColour likesBanana
John Doe 42 blue true
Jane Doe 42 red false
Alice Dreamer 36 green true

Let's define it in Etlang, so that it knows how to read the CSV file. In your Macro, in the central view, trigger the completion menu with Ctrl+Space then start to write file structure and select it among the completions.

A file structure is named and defines a table. Give a name for both, person for example. Then in the table section, you will need to provide the columns of your table, as well as their type.

The optionalFrom is used if you want to use a different name than the header in the CSV file. In that case, use the name you want, and in optionalFrom put the exact header used in the CSV file.

Step 3: Read the file

Now that we have declared our table, we can read the CSV file with the table structure defined above.

To read the file, we need to add a new instruction. After the file structure definition, go to a new line, trigger the completion with Ctrl+Space, write and select read structured file.

This operator needs a file path and a table structure. The file path should be a string thus type " followed by the filename, for example, "people.csv". For the table structure, your previously defined person should be proposed, by the completion.

read file from "hello.csv" with structure person

Step 4: Modify the table

Now that we have read the file, we can modify the table, for example, we can add a new column, with the birth year of the person.

It should look like this:

add column birthyear ⟵ 2023 - #Person.age

To write it, go to the next line after the read from file operation, and trigger the completion with Ctrl+Space. Select add column from the completion menu, then write the name of the new column, and press Enter. You have something like this:

add column birthyear ⟵ 

The only thing left is to give a value to the column. Write 2023 -, then you need to reference the value of the previous table read from the file.

The previous table is given by #table (it will be displayed as #Person). Access a column of the table, use a dot. In our case we want the age, thus write #Person.age.

Analyse table columns

After some operators, you may lose track of which column is in the table.

To see the columns of a table, use the action Show type on an operator: Place your cursor on the operation name, Ctrl+Shift+P.

You will see the column birthyear has been added.

Table type inspector

Step 5: Save the table

To save the table in our database, we need to have only columns: first name, last name, and birth year.

We can add one more operator to keep only these 3 columns, by restructuring the table, with the restructure operator. Restructuring a table is like replacing the table with a new one, you will therefore need to provide a new table with the table expression (not #table, this is to access the previous table).

Hereafter is a complete recap of what we have done with a video:

Bonus Step 6: Defining symbols

The current year is a bit hardcoded into the expression, we could extract the value and define an expression for later reuse.

To do that, go on a blank line before the initialize from file operator (to avoid cutting the sequence of operation) and write a define expression statement. For example define 2023, with the name currentYear. Then in place of the 2023 while adding a column, use the name of your expression, currentYear.

Why not cut the sequence of operation ?

You should keep the sequence of operations together without blank lines, or other things than operators. Otherwise the reference to the previous table does not work as there is no table produced on the line just above.

Success

We have written our first ETL process, and we have seen some essential language constructs. You can go deeper by reading the language reference.