There is a new software company which is offering an Open Source alternative for data migration and integration to the vendors such as Informatica and Cast Iron Systems. The company is called Talend and they have recently released version 3.1 of their Talend Open Studio – an Eclipse-based data integration tool. Talend’s approach is very interesting. Remember all those code generation tools of yore. Well, it seems that this paradigm hasn’t completely gone out the window – the Talend Open Studio is a powerful code generator –either in Java or Perl. After the code is generated and compiled, it can then be run directly from the Studio workspace or exported and run in standalone mode. Talend also gives you the capability of implementing parameters for your jobs so the same code can run in different environments or contexts. But I’m getting ahead of myself. Let’s walk through a relatively simple scenario to introduce the tool. You’ll probably want to go to http://www.talend.com and download the latest version of the Talend Open Studio to follow along with steps outlined below.
When you download the tool the first thing you’ll have to do is set up your local Repository. Click the ‘…’ button to start the setup.
Fig 1 – Welcome Dialog
Only the User E-mail field is required to setup your local repository. Click ‘OK’ to return to the initial splash page. You’ll only need to return to do the repository setup once.
Fig 2 – Setting up the repository
The next thing to do is to create a new local project. After you’ve selected ‘Create a local project’ from the dropdown, click ‘Go!’
Fig 3 – Welcome Dialog – Create new Local Project
Talend displays a New Project dialog box and asks you what language you’d like to use. I’m a Java man myself so I that’s what I chose. The User Guide discourages you from mixing the two programming environments in the same workspace.
Fig 4 – The new project dialog
This creates the project on the initial splash screen. Click ‘Open’ to display the Talend Open Studio workspace.
Fig 5 – Welcome Dialog – Open your project
The user interface will be somewhat familiar to those accustomed to Eclipse. Your Repository is where your Jobs will be organized and stored. The Palette is where the components for your jobs reside.
It’s best to explore the interface with an actual job so let’s go ahead and create one. Right-click on the Job Design section of the Repository.
Fig 6 – Repository View – Job Designs
And create your job in the New Job dialog box. Only the job name is required.
Fig 7 – New Job Dialog
This brings you back to Talend workspace with the component palette visible. At this point you can start to get a sense of the power and flexibility available with the tool. The palette has a variety of File Input and Output components, Database integration tools, business systems integration tools for vendors like SAP and Salesforce.com, custom coding capabilities, and of course a Map component for transformations between systems.
Fig 8 – Component Palette
To make use of a component simply drag it out on to the canvas in the center part of the workspace. The input components push data into your jobs and the output components draw it out. Since I’m pretty heavily involved in Salesforce.com these days, let’s use a tSalesforceInput component for our first job. It’s under Business->Salesforce in the Component Palette. Drag that component onto the canvas and right-click it to bring up the context menu. Select “Settings” from the context menu to enter your connection information.
Fig 9 — Canvas
I’m going to download the Users in my Salesforce org to a spreadsheet so I’ve selected the User Module from the Basic Settings of the tSalesforceInput component. You’ll need to enter your connection parameters and remove the Query Condition – ‘name = talend’.
Next I’ll drag a tFileOutputDelimited component onto the canvas. To create a data flow I connect the two components. I’ll need to right-click again on tSalesforceInput and select
Row ->Main from the context menu.
Fig 10 – Context Menu
This will join the two components and create our first subjob. A subjob, which is highlighted in blue, is a grouping of components assembled to complete particular task. It is possible, though perhaps not advisable, to create several subjobs in a single job design. A particular job may also have pre-processing and post-processing components.
Fig 11 – Subjob on the canvas
Right-click on the tFileOutputDelimited component and bring up the settings menu. Click the sync columns button to copy the Salesforce User object schema to the schema of the output file component. You might also w
ant to change some other settings in your delimited file. I generally use comma-delimited files myself so I update the field separator, and click the Include Header checkbox.
Fig 12 – Component settings tab
I also select the CSV options checkbox on the Advanced settings tab to reveal the quotes as escape characters.
Fig 13 – tFileOuputDelimited Advanced Settings
Alright, we have a simple job built and ready to go. Click the Run tab at the bottom of the screen. I like to get a little feedback to see if you my job is working so I’ve selected the Statistics checkbox. You can get even more feedback by selecting Traces. When you click ‘Run’ your job will first compile and then execute. If there are errors in the job, Talend will display a stack trace in the console by default and conveniently you can switch to the code tab of the canvas to see what part of the job is causing the issue. You don’t modify Talend-generated code directly unless your code is hosted within a Custom Code component like a tJava or tJavaRow component.
Fig 14 – Run tab
I click ‘Run’ and can see from the statistics on the canvas that I successfully pulled down three rows into my delimited file.
So your might be asking yourself, “Hey wait a second, can’t I just do this with Data Loader?” And the answer is yes of course you can. In fact, Talend doesn’t have some of capabilities that we’ll need to do production ready loads of data with Salesforce – most notably it doesn’t have the success and failure reporting capabilities of Data Loader – so you can’t really pitch Data Loader yet. What we haven’t shown in this simple example is power of Talend to do lookups with external information , to visually map between fields and to do transformations to the data itself. We’ll take on these more potent capabilities in Part 2 of the series when we take a closer look at the tMap component. In one of my recent projects I used Talend to do the transformation and lookups of related information and used Data Loader to actually perform the inserts and updates to Salesforce.com.