Home | Delphi Gurus | About Us

Wednesday, December 12, 2012

Pentaho Data Integration For The First Time Part I

In the previous article, we have learned how to intall Pentaho. In this article, we'll learn Pentaho Data Integration (Kettle). According to it's wiki, Pentaho Data Integerations is a part of Pentaho Studio that delivers powerful Extraction, Transformation and Loading (ETL) capabilities using an innovative, metadata-driven approach. With an intuitive, graphical, drag and drop design environment, and a proven, scalable, standards-based architecture, Pentaho Data Integration is increasingly the choice for organizations over traditional, proprietary ETL or data integration tools. Before learning PDI deeper, make sure that you have download PDI files from sourceforge. Choose the latest one or 4.2.1-stable (I use this version) in this tutorial. Extract those files.

Before, running PDI, make sure your PC environtment is like the below table :

An operating system: 
• Linux (Red Hat Enterprise Linux 5, SUSE Linux Enterprise 10)
• Windows (XP, 7)
• Solaris 10
• Apple OS X (10.5 or newer)

A Java Runtime Environment

One or more data sources: 
• Any JDBC-compliant database
• A spreadsheet
• A flat file containing comma-separated values

I will use Windows 7 as operating system, and mySQL as database.

FYI, Pentaho Data Integration consists of these packet :

Spoon (graphical interface)
Spoon is a graphical user interface that allows you to design transformations and jobs that can be run with the Kettle tools — Pan and Kitchen

• Kitchen (CLI job interface)
Kitchen is a program that executes jobs designed by Spoon in XML or in a database repository.

• Pan (CLI transformation interface)
Pan is a data transformation engine that performs a multitude of functions such as reading, manipulating, and writing data to and from various data sources.

• Carte (CLI execution engine for PDI content)Carte is a simple web server that allows you to execute transformations and jobs remotely.  It does so by accepting XML (using a small servlet) that contains the transformation to execute and the execution configuration.  It also allows you to remotely monitor, start and stop the transformations and jobs that run on the Carte server.

A server that is running Carte is called a Slave Server in the Pentaho Data Integration terminology.

Acording to me, Spoon is the main packet of PDI. So, we will try to run spoon, create a simple transformation to make sure that PDI installation is OK. To run spoon (in windows environtment) we can execute spoon.bat. This file is in the folder where we extract the PDI files before. If everything is ok, we will see the spoon splash screen.



and the below picture is welcome screen of PDI.



1. Create Transformation

Transformation is set of steps to do something. To create a transformation, follow these steps :
a. Click CTRL+N or File > New > Transformation or Click Transformation Button in main toolbar.
b. Save , name it : MyTransformation

2. Create Database Connection

In this article, we use mySQL as database.  Before creating database connection, we must provide a database. We can use phpmyadmin to do this job. Create a database, name it : kettle (you can use other name)


To create a new connection, Click View Table, right click the Database Connections in the tree and select New or New Connection Wizard. You can also double click Database Connections, or press F3. Then, there will be new window (Database concetion). Give it "DB_MySQL" as its name. Choose MySQL as Connection type. Then, fulfill settings data located at righside of Connection type. Don't forget to choose "Native(JDBC)" Access type. For details see the following picture




In this tutorial, we will import data from txt file to database. To do this, we must prepare data in a txt file. Open a notepad, write on notepad the below data :

Name;Age
John;32
Hendry;40
Dunant;50
Kevin;23
Duran;19

Save it as people.txt. After creaing txt file, let's back to Spoon. Click Design tab. Then look at steps tree. Look for Input , and then drag text file input to myTransformation tab.



Next, double click text file input. There will be new window. 
- Give "people" as step name. 
- Click File tab.
- Click Browse button, choose people.txt, click Open.
- Click Add button to move filename into selected files table


- Click Fields tab to get data field from text file
- Click Get Fields Button to get all fields
- Click preview button to test that we configure it well
- If you get no error, click OK


Until here , we have done a half of this tutorial. In the next article, we will continue to save data into table

Labels: