GeoKettle is a powerful, metadata-driven spatial ETL (Extract, Transform and Load) tool. It is dedicated to the integration of different data sources for building and updating geospatial databases and data warehouses. It enables the Extraction of data from data sources, the Transformation of data in order to correct errors, make some data cleansing, change the data structure, make them compliant to defined standards, and the Loading of transformed data into a target DataBase Management System (DBMS), GIS file, or geospatial web service.
This Quick Start describes how to:
- Load an existing data transformation
- Create a new data transformation
As illustrated in the following screenshot, the Workbench window is composed of different panels.
The left part acts as a catalog containing all the steps which could compose a data transformation. The right part of the workbench is the area where the transformation itself would be designed and runned/debugged.
The contents of these panels will be described further as we demonstrate their use.
To load an existing transformation, select File ‣ Open. Browse to the transformation samples subdirectory file:samples/transformations/geokettle, then select one of the available sample transformations and click OK. GeoKettle transformation are stored in files with the extension « .ktr ».
The following picture shows the sample « intersection » transformation. You should obeserve that the content of the two main parts which compose the workbench have changed.
A description of the transformation and optionnal directives can be seen in the yellow tooltip area.
Before starting the transformation, you will need to specify wich shapefile to use. In order to do that, double click on each of the « GIS file input » steps to make the following dialog appear (Note : you may also customize any steps of any transformation by double clicking on it).
Enter the name of your shapefile including the « .shp » extension or leave it as is to use the sample dataset and click OK.
You are now ready to start the transformation. To do so, simply hit the play button in the toolbar above your transformation.
Launch GeoKettle and access the workbench in the same way you would do when loading an existing transformation (see previous section).
To create a new transformation, select File ‣ New ‣ Transformation. You can specify the name of the transformation by saving it under a different name (select File ‣ Save as...).
As shown in the following picture, all available steps are listed by category in the left area of the workbench. Expand any category to see its available steps.
To add a new step to the transformation, drag it from the Steps panel to the transformation panel. You may then customize this new added step to your transformation by double clicking on it.
Hops
A hop, represented as an arrow between 2 steps, defines the dataflow between those steps. As shown in the following picture, adding a hop from Table Input to Add sequence means that the resulting output of Table Input will be sent to the Add sequence step for further processing and etc.
To create a new hop, select 2 steps, right click on one of them and select New hop. Another way of doing it is to press and hold Ctrl while selecting the 2 steps.
Any hop can be edited at any time by double clicking on it or right clicking on it and selecting edit hop in the popup menu.
Setting up the transformation
Most of the steps in a transformation will require custom parametrization before being usable. Double click on any step to display a dialog interface in which you can see and specify each requested parameter values.
Running a transformation in GeoKettle
When executing a transformation in GeoKettle, a new panel appears below the one where the transformation is designed. This panel (aka the Execution Results panel) contains information about data flow through all steps involved in the transformation.
The Step Metrics tab (shown in the the next figure) will be initially displayed. You can see in this tab general information regarding the transformation’s dataflow such as the number of rows read, written, in input and in output in each step. The column Active informs the user if the step is started, running, finished, aborted, etc. The time ellapsed since the step has been started is shown in the column Time, as well as the average speed (column Speed) of the step (rows/seconds).
Previewing a transformation
Trying to execute a transformation may result in errors in the Execution Results panel (see next figure). Please then review the content of the Logging tab. There is always a lot of usefull information dealing with the source and reason of the error. Modify the parameters of the faulty step and restart the transformation.
To help in finding the source of an error, you can also preview the results of a transformation from another step earlier in the workflow. To do so, right click on the step, and select Preview in the popup menu that appears. This way, you can see what the data looks like at this point in the overall process without executing the whole transformation.
Here are some additional challenges for you to try:
Take a look at the GeoKettle user and developer documentation and tutorials available at http://wiki.spatialytics.org