Home | Delphi Gurus | About Us

Wednesday, December 12, 2012

Pentaho Data Integration For The First Time Part I

In the previous article, we have learned how to intall Pentaho. In this article, we'll learn Pentaho Data Integration (Kettle). According to it's wiki, Pentaho Data Integerations is a part of Pentaho Studio that delivers powerful Extraction, Transformation and Loading (ETL) capabilities using an innovative, metadata-driven approach. With an intuitive, graphical, drag and drop design environment, and a proven, scalable, standards-based architecture, Pentaho Data Integration is increasingly the choice for organizations over traditional, proprietary ETL or data integration tools. Before learning PDI deeper, make sure that you have download PDI files from sourceforge. Choose the latest one or 4.2.1-stable (I use this version) in this tutorial. Extract those files.

Before, running PDI, make sure your PC environtment is like the below table :

An operating system: 
• Linux (Red Hat Enterprise Linux 5, SUSE Linux Enterprise 10)
• Windows (XP, 7)
• Solaris 10
• Apple OS X (10.5 or newer)

A Java Runtime Environment

One or more data sources: 
• Any JDBC-compliant database
• A spreadsheet
• A flat file containing comma-separated values

I will use Windows 7 as operating system, and mySQL as database.

FYI, Pentaho Data Integration consists of these packet :

Spoon (graphical interface)
Spoon is a graphical user interface that allows you to design transformations and jobs that can be run with the Kettle tools — Pan and Kitchen

• Kitchen (CLI job interface)
Kitchen is a program that executes jobs designed by Spoon in XML or in a database repository.

• Pan (CLI transformation interface)
Pan is a data transformation engine that performs a multitude of functions such as reading, manipulating, and writing data to and from various data sources.

• Carte (CLI execution engine for PDI content)Carte is a simple web server that allows you to execute transformations and jobs remotely.  It does so by accepting XML (using a small servlet) that contains the transformation to execute and the execution configuration.  It also allows you to remotely monitor, start and stop the transformations and jobs that run on the Carte server.

A server that is running Carte is called a Slave Server in the Pentaho Data Integration terminology.

Acording to me, Spoon is the main packet of PDI. So, we will try to run spoon, create a simple transformation to make sure that PDI installation is OK. To run spoon (in windows environtment) we can execute spoon.bat. This file is in the folder where we extract the PDI files before. If everything is ok, we will see the spoon splash screen.



and the below picture is welcome screen of PDI.



1. Create Transformation

Transformation is set of steps to do something. To create a transformation, follow these steps :
a. Click CTRL+N or File > New > Transformation or Click Transformation Button in main toolbar.
b. Save , name it : MyTransformation

2. Create Database Connection

In this article, we use mySQL as database.  Before creating database connection, we must provide a database. We can use phpmyadmin to do this job. Create a database, name it : kettle (you can use other name)


To create a new connection, Click View Table, right click the Database Connections in the tree and select New or New Connection Wizard. You can also double click Database Connections, or press F3. Then, there will be new window (Database concetion). Give it "DB_MySQL" as its name. Choose MySQL as Connection type. Then, fulfill settings data located at righside of Connection type. Don't forget to choose "Native(JDBC)" Access type. For details see the following picture




In this tutorial, we will import data from txt file to database. To do this, we must prepare data in a txt file. Open a notepad, write on notepad the below data :

Name;Age
John;32
Hendry;40
Dunant;50
Kevin;23
Duran;19

Save it as people.txt. After creaing txt file, let's back to Spoon. Click Design tab. Then look at steps tree. Look for Input , and then drag text file input to myTransformation tab.



Next, double click text file input. There will be new window. 
- Give "people" as step name. 
- Click File tab.
- Click Browse button, choose people.txt, click Open.
- Click Add button to move filename into selected files table


- Click Fields tab to get data field from text file
- Click Get Fields Button to get all fields
- Click preview button to test that we configure it well
- If you get no error, click OK


Until here , we have done a half of this tutorial. In the next article, we will continue to save data into table

Labels:

Sunday, November 4, 2012

Pentaho Installation


When I entered into Pentaho World for the first time, i asked my self how to install pentaho. It may happen to you. Actually, there is no installation step before using pentaho. Tthree things you must do before using pentaho are install Java , setup JAVA_HOME, and run your pentaho

1. Install Java

Installing Java is similar to installing other software. Just donload JDK, double click, follow the instructions until finish. U can get the latest version of JDK in oracle website

2. Setup JAVA_HOME

Now is the time to setup JAVA_HOME. Before setting up JAVA_HOME, you must know the installation path for the Java Development Kit. As example, i installed java in C:\Program Files\Java\jdk1.6.0_23.


Once you have the JDK installation path:
  1. Right-click the My Computer icon on
  2. your desktop and select Properties.
  3. Click the Advanced tab. Click the
  4. Environment Variables button. Under System Variables, click New.
  5. Enter the variable name as JAVA_HOME.
  6. Enter the variable value as the installation path for the Java Development Kit.
  7. Click OK.
  8. Click Apply Changes.
You might need to restart windows.
The complete article is here: Setting JAVA_HOME Variable in Windows
The bellow picture may help you 


2. Run Your Pentaho

FYI, Pentaho is collection of Business Intelligence (BI) application. It consists of Pentaho BI Server, Pentaho Data Integration (PDI), Pentaho Report Designer, etc. If you want to run one of pentaho's application, you just double click bat file that is included in its bundle. For example, if you want to run spoon (one of PDI component), just double click spoon.bat.
Oks, have a nice pentaho installation

Labels:

Friday, October 19, 2012

About Us

Business Intelligence (BI) is something new asspecially for me. Today, beside working at my beloved office as a application developer, I also learn Business Intelligence (BI). Pentaho is my choice. Why? Because it's free, opensource, etc.

Actually, this blog is not Pentaho Tutorial Blog. It's only blog where I write my roadmap to learn Pentaho. If you find something wrong with my article about Pentaho, you can send me an email and we'll discuss it : )

Labels: