Loading Data to Salesforce.com with the Bulk Data API using Talend

January 7, 2011 Appirio

Ward Loving

Talend has recently acquired the capability of inserting data using the new Salesforce.com Bulk Data API.  This capability gives us two important features:  we can load large volumes of data to Salesforce.com much more quickly than we can using the Standard API, and second, we get feedback directly from Salesforce.com about whether the records were successfully inserted into the system.   Loading data to Salesforce.com with the Bulk API and Talend is a two-step process.  The first step is to prepare the data and create the load file.  The second step is to use the tSalesforceBulkExec component to load the data and get the results.    To demonstrate, the data preparation part of the process, I’ve created a pipe-delimited data file as the sample data we’ll load for this article.  

Name|Phone|Type
Acme Lighting|(800) 555-1212| Prospect
American Information Systems|(222) 222-2222|Customer – Direct
Able Moving Company|(999) 999-9999|Customer – Direct 

The data preparation part of this process will convert this pipe-delimited file to a comma-delimited file –a csv file — which can be loaded to Salesforce.com.  The tSalesforceBulkExec component can only process files in csv format.  Data in any other form or format will have to be converted to this before it can be loaded using the Bulk API.

To follow along with this example you’ll need to create a new job.  I’ve called mine AccountLoad.  Here is my workspace after dragging out the components for step 1:

Figure 1 – Components to Read Data File

Next, click on the tFileInputDelimited_1 component, and set up the parameters for reading the pipe delimited files:

Figure 2 – Parameters for Pipe-Delimited File

Then you’ll need to click the button labeled “…” to setup the schema for the input file.

Figure 3 – Schema for the Input File

You’ll connect the two components by right clicking on the tFileInputDelimited component and selecting Row -> Main and then dropping the little plug icon on the tSalesforceOutputBulk component.  This will push the schema from the file input component to the Bulk Data file component automatically.  The final step for creating your Bulk Data input file is to set the path of the output file for the tSalesforceOutputBulk component.

Figure 4 – Path for Bulk Data Load file

The second part of the AccountLoad job is setting up the tSalesforceBulkExec component to load data to Salesforce.com.  You’ll also want to drag two tFileOutputDelimited components onto the palette to capture your success and error reports.   I’ve labeled one of these files Success and one of these files Errors.  You can do that on the View panel of the component tab.

Figure 5 – Workspace with tSalesforceBulkExec Component

Set up the Bulk file path for the Bulk Exec component to point to the output file generated by tSalesforceOutputBulk component.  In this case, that is “C:/Talend/bulkDataFile.csv”.  You’ll also need to complete the parameters for your destination org: Salesforce WebService URL, Username, and Password.  And finally, you’ll need to update the schema to reflect the columns in the data file.  The process here is similar to the setting up the schema for the pipe-delimited input file itself.   The parameters for your tSalesforceBulkExec parameters should look something like the following:

Figure 6 – Parameters for tSalesforceBulkExec Component

After this, you can create the flow relationship between the BulkExec and the tFileOutputDelimited components.  To do this right click on the tSalesforceBulkExec component and select Row -> Main to connect to the Success output file component.  For the flow to the Error file output component you’ll right click the same way but you’ll choose Row -> Reject.    Your workspace after you’ve done this should look like the following:

Figure 7 – Workspace with Flows from Bulk Data Component

Set up the output files paths, header settings, and delimiters for each of the output file components.    If you click the schema button on the Success file output component then you see that the schema of this component has a couple of additional columns: a “salesforce_id” column and a “salesforce_created” column.

Figure 8 – Schema for the Success file output component

The error component has a similar setup except that an “error” column has been added instead of the “salesforce_id” and “salesforce_created” columns.

Figure 9 – Schema for the Error file output component

The final step in the configuration of our job is the connection of the file generation to the data upload part of the job.
  This will let Talend know to start the load to Salesforce only when the file creation process has been completed.  To do this, right click on the tSalesforceOutputBulk component that we created earlier and select Trigger -> On Component Ok.  This completes the configuration.

Figure 10 – Workspace for the completed AccountLoad job

Now run the job.   With luck, you’ll see something similar to the following:

Figure 11 – Running the job

Open your success.csv report to validate your data:

Name,Phone,Type,salesforce_id,salesforce_created
“Acme Lighting”,”(800) 555-1212″,”Prospect”,”0015000000Xn1cmAAB”,”true”
“American Information Systems”,”(222) 222-2222″,”Customer – Direct”,”0015000000Xn1cnAAB”,”true”
“Able Moving Company”,”(999) 999-9999″,”Customer – Direct”,”0015000000Xn1coAAB”,”true”

When you’re using the component to load more realistic data volumes you’ll need to keep an eye on the number of Bulk API Batches you have available within a 24 hour period.    A typical organization has 1000 batches per day.  The Bulk API Monitoring Console can be found in your Salesforce instance under Setup->Monitoring->Bulk Data Load Jobs.  You can adjust the number of rows to commit per batch on the Advanced Settings tab of the tSalesforceBulkExec component.

Figure 12 – Bulk API Parameters

The Bulk API components give users capability that wasn’t previously available in Talend – the ability to see whether a record has been successfully inserted into the system and if not what the problem was.  This feature combined with the capability to load data in large quantities makes Talend a legitimate option for production-ready data loads to Salesforce.com.

Previous Article
Learning Ruby for Force.com Developers – Part 2
Learning Ruby for Force.com Developers – Part 2

This is part #2 from my adventures of learning Ruby. If you missed part #1 you might want to take a look at...

Next Article
Learning Ruby for Force.com Developers – Part 1
Learning Ruby for Force.com Developers – Part 1

With salesforce.com’s recent purchase of Heroku, Ruby just got more interesting. I’ve never dabbled in Ruby...