Nov 17th, 2021
Posted on Mar 29, 2021 Tools & Tips
Extraction is the method of pulling or reading data from the source system(s). While it will be important to get data into its target end format, avoid doing so at this phase. The main goal of the extraction is to copy data from a source to a place where it can be transformed. This involves identifying the source tables and their relationships and writing to another staging location. The extracted data source may be a database, a flat file, XML, or JSON.
Extraction can be manual, such as running reports, exporting CSV files or running SQL queries. Many relational database management systems can be configured to run extraction jobs automatically and regularly.
There are also helpful third party ETL tools—like Jitterbit, an EightCloud partner—that make it easy to connect to sources and run regular extraction jobs. Jitterbit provides native connectivity to all enterprise applications as well as relational databases, flat files, XML, and cloud data. EightCloud also leverages Jitterbit’s recipes, which provide templated ways to extract data from common data sources.
Whatever method you choose to employ, it’s important to clearly identify the source system(s) and be able to systematically export information. If this is done manually, or there are manual steps involved, use written instructions that can be easily followed to allow for consistent and accurate extractions. Have a plan in place to validate that the extraction was successful. It’s important to master the extraction before moving on to the next step: Transformation.
Transformation involves cleaning, formatting, consolidating, and filtering the extracted data into a format that is ready to load into the target system. For example, imagine a source dataset has an attribute that stores the Country of an Account. The problem is, this Country is free text, leaving plenty of room for spelling mistakes and inconsistent data entry. The source might contain US, U.S, U.S.A, United States, and a variety of other permutations of the same information. Going forward, the target system, Salesforce, will use Country Picklist values. This means the source information needs to be mapped to its new form.
Maybe it will be decided that while some attributes need to be migrated over, others will be ignored. Perhaps we only need some data from Source System A and we need to merge it with more information from Source System B and perform deduplication. All this, and more, is considered transformation.
In a typical RDMS system, such as SQL server, it’s common for an SQL Developer to write jobs that perform these cleansing tasks. If source data is extracted and stored in CSVs or Excel files, it may be necessary to write macros or detailed cleansing and transformation instructions to be performed manually.
Tools like Jitterbit provide less stressful and time-consuming ways to transform. Using the declarative and easy-to-build transformation wizard, mapping logic and transformation rules can be easily applied, tested, and run. EightCloud’s partnership with Jitterbit allows us to take advantage of these great tools to deliver accurate results to our clients.
Loading refers to the process of inserting the extracted and transformed data into its destination or a staging environment. Loading can consist of adding new data or a combination of adding new records and updating existing records. Before loading, mappings will need to be maintained between the source and target system’s attributes. It’s important to document the mapping decisions where it can be maintained and versioned.
Another important part of the loading phase is understanding dependencies. For example, Accounts need to be loaded before Contacts. Opportunity Line Items can’t be loaded until Opportunities are inserted and Opportunities can’t be inserted without existing Accounts. The loading order of operations needs to be clearly understood and documented.
Many data migration efforts fall short because, while time is spent in the extraction and transformation stages, it isn’t spent in the loading stage. The difference between a successful and unsuccessful data migration is often the false assumption that loading is done once directly in the production environment. In practice, the best data migrations involve multiple iterations of the entire process, including loading until the process is completed without errors.
Loading can be done through tools like Salesforce Data Loader and import wizard with a series of manual steps – one CSV at a time. Again, third party tools like Jitterbit remove labor intensive steps by providing the tools to easily build mappings, specify targets and build in table dependencies. EightCloud takes advantage of Jitterbit recipes which serve as templates for common data loads – such as Salesforce.
Extraction, transformation, and loading are three core components of a successful data migration. As mentioned previously, it’s important to practice the entire ETL process repeatedly. Time-box your efforts and try to complete a real, end-to-end migration in the allotted time. The best way to identify mistakes before they happen is to go through all the work involved.
Author: Grant Ongstad, Senior Salesforce Consultant
After our first major project with EightCloud, we have done several smaller ones. The relationship is such that they've become a trusted partner we go to, whether something planned ahead taking hundreds of hours, or small quick fixes and solutions. They are there when we need them, without hard sell tactics to make monthly quotas in between... a wise way to keep a relationship."
- Mike Heskitt, Altair
Partners and friends, we are proud to share the new EightCloud brand and website with you. Our transformation reflects our commitment to helping you achieve Salesforce excellence through a focus on efficiency, expertise and genuine partnership.
We wish you the best and look forward to connecting with you soon!