How to clean data in Excel? – eternal problem and not only in Excel! Data cleansing is a key question; either we create a simple analysis. In today’s article, we will go through the most important methods. String manipulations in Excel are very important! Besides the well-known methods, we will introduce special VBA codes that will make the process quicker and more effective. Furthermore, in the last section of the article, we’ll introduce our free Excel add-in and productivity tool.
Effective ways to clean data in Excel
There are many data cleansing methods, although we often forget about this obvious method. Let’s see the basics first!
Sorting data to check extremes and peaks
Is it possible to find errors in a list using a simple sorting? Yes! We have nothing else to do but arranging the data, either growing or decreasing order. Right-click on the cell and choose the ‘Sort Largest To Smallest’ option from the menu. Seldom there can be found extremely small or large, maybe peeking data.
Look at the picture below:
Who would ever think that these kinds of mistakes can be found in several million-line data? Another great reason that data cleansing can never be forgotten about.
Remove duplicates in Excel
Excel supports many options to eliminate duplicates. The filtering of unique values and the removal of duplicated data are two tightly joined operation. The result is the same in both cases: a list of unique values.
To filter for unique values (filtered list), use the Advanced command in the Sort & Filter group on the Data tab.
Select the range of cells! Then go to the Data tab! In the Sort & Filter group, click Advanced. Choose the Unique records only check box and click OK.
Explanation: However, there is one major difference that we cannot ignore. In the process of filtering unique values, it temporarily hides the duplicated values. The keyword is: temporarily! You can back your original list to undo this operation.
If you are using the ‘Remove Duplicates’ tool on the ribbon, you’ll delete duplicates permanently! Pay attention to this! The removed, deleted items will be last after saving the workbook. So be very careful.
Let us see how the duplicate removal values tool works! First, select the range. Go to the Data tab! In the Data Tools group, click Remove Duplicates.
Select one or more columns then click OK.
A popup window is displayed showing how many duplicate values were removed and how many unique values remain.
Use the find and replace function
The advantage of the Find & Replace function is that we can work with it relatively fast in any size data table. Let’s see how to clean data using this function. To find and replace data in a worksheet, follow these steps:
Press Ctrl+H and the Find and Replace dialog box appears.
In the Find What box, enter the data you want to locate. In the Replace With box, enter the data with which you want to replace the found data.
If you want to replace all occurrences at the same time, click Replace All.
Check the Type of Data in a Cell
Wait a minute before we make any changes to the raw data.
We have to check the type of data is in the cell. We can do this by using the TYPE function. The function is one of a group of information functions. Numeric or text data? In the picture below, you can see what kind of results the formula brings in case of different kinds of data.
If the function return with 1, the data type is numeric. If it’s returning with 2, the data type is text.
Convert Numbers Stored as Text into Numbers
If we are importing data from a text file or from an external data source into Excel, an often occurrence that the numbers being stored as text. This can be the source of many problems, but there are ways to avoid mistakes. Follow the next description, step by step!
Select a blank cell then type 1. Press Control + C to select the text range which we would like to convert to numbers.
Right-click and choose Paste and Paste Special. In the dialogue box, select Multiply.
Eliminate blank cells in a list or range
Now we’ll highlight the whole range that also contains the empty lines. Press the F5 button, and from the popup window, choose the special option.
All the cells in the selection that are not blank are deselected, leaving only the blank cells selected.
In the ‘Cells’ section of the ‘Home’ tab, click ‘Delete.’
Finally, select ‘Delete Sheet Rows’ from the drop-down menu.
Clean Data to Split text into columns
There is a possibility for the text of one or many cells to divide into several cells. When can this be useful? Let’s suppose that our data set contains our clients’ first and family names in one cell. We would like to split the names that the first names and family names be in different columns.
Start the ‘Convert Text to Columns Wizard’ and follow the steps below:
Select the range which contains names.
Click Text to Columns in the Data Tools group.
Select the Delimiters checkbox and click Next.
Leave all the checkboxes empty under the Delimiters section. Now check the Space check box (in our example). Click Finish!
Clean Data using Built-in Excel Functions
Concatenate text using TEXTJOIN function
And finally, let’s see the opposite of the previous section. Now we would like to join the text from several cells into one. How can we solve this problem quickly?
The TEXTJOIN function joins (what a surprise) values using a selected delimiter. It provides better functionality than CONCAT. This function allows you to supply a range of cells, and has a setting to ignore empty values.
Check the formula bar! You can use space as a delimiter. The second parameter is TRUE because we would like to ignore the empty cells.
If you are using an earlier version of Excel, you should install 3rd party add-in.
Change Text to Lower – Upper – Proper Case
Let’s see what kind of Excel text transformation formulas we can use for the transformation of the text. It is worth to use them during data cleansing.
The LOWER() formula transforms the given text into lower case letters. Example: LOWER(‘EXCEL’), Result: ‘excel’
The UPPER() formula transforms the given text into upper case letters. Example: UPPER(‘excel’), Result: ‘EXCEL’
The PROPER() formula transforms the first letter of the character string, and the letters standing after none-letters into upper case letters. And transforms all the other letters into lower case letters. Example: PROPER(‘Information is BEAUTIFUL’)
The result of the conversion is: ‘Information Is Beautiful’. Not too often, but we might use these kinds of transformations also.
Remove non-printable characters – The CLEAN formula
This is one of the most special fields of data cleansing! In general, we can say that work is relatively difficult with the use of special text formulas. We have made some short VBA codes for several possibilities.
To remove non-printing characters, use the CLEAN() formula!
Let’s see how it works with the help of the following example! You can check the result using LEN() formula.
Remove unnecessary characters from text using macros
If we only want to keep special characters of a string, the following macro-enabled workbook can help. Download this Workbook to remove alpha, numeric, or non-numeric characters in seconds.
It’s easy to clean data with these smart VBA codes. Go to the Developer tab and click Macros.
Clean extra spaces using the TRIM function
Removing extra spaces from the text is a task that needs more caution. Spaces can be at the beginning, in the middle, or at the end of the text. Hardest to recognize is the letter ones because they are invisible for the naked eye.
We’ll explain the two reasons they have to be removed. First that they can cause mistakes at the use of formulas. Imagine, for example, that the result of the VLOOKUP() formula is incorrect. It may sound funny, but it is not. Some formula mistakes may influence the outcome of the final calculation.
The other reason is the size of the file. Although it is true that the one-character surplus is not baneful. Problems occur when the Excel database contains several million records.
Yes, there are some like this, just think about the external data sources and Power Query output. In this case, the size increase can even be 5% – 10%. We don’t have to depict the negative effects of this…. Use the TRIM() formula and guaranteed that the list will be clean.
Automated Data Cleansing in Excel – Free Add-in
Before we take a deep-dive into this topic, allow us to tell a little story.
We would like to summon an old memory from those times when I didn’t even know what Excel was. After college studies, I got employed at a small company with limited HR resources.
As a beginner, they didn’t dish out the premium tasks for me. Clean data in Excel!
I got the seemingly very boring tasks. This wouldn’t have been a problem on its own. Perfect execution versus deadlines? I have never heard of the Excel string manipulation function, so the solution wasn’t simple. These data cleansing related tasks consumed many night shifts.
We have developed a simple and lightweight VBA application, especially for data cleaning.
We would like the text manipulation, and Excel data processing would be a child’s play for you. Today we publish a universal Excel add-in tool. From now, string manipulation and quick data cleansing will be very simple!
Clean Data using Excel VBA – String Manipulation Functions
The first function is widely used with Excel text styled data. We can also use classical Excel formulas, but there are cases when this is just not enough. In the following picture, you can see what kind of Excel string manipulation functions we have built-in.
As a first step, highlight the range that you would like to apply for the text transformation. After that, start the program from the ribbon! From the list, choose the Change Case element and select your preferred operation.
I think these functions do not require further explanations. Apply the selected function on the highlighted data range, and the task is done.
Insert text after specific nth characters
With the use of some Excel functions, we can resolve that we insert some arbitrary characters in front of or behind a given string. The problem starts when we have to perform this operation on an Excel database. By choosing the ‘Add Text’ option, we can manipulate huge data sets in a matter of seconds! Execution this task manually is the real mission impossible case using simple Excel formulas.
From the Excel string manipulation formulas we have to emphasize the SEARCH(), FIND(), MID(), LEFT(), RIGHT() functions.
Now imagine that comfortable situation that you don’t have to create endless formulas ever again! Simply start the functions can be seen in the picture.
Delete the text after nth character
Out of the so far introduced text transformations, some can be done with the use of previously described formulas. The following will be a real treat for the professionals working in the field of business. Let’s stay a little more with the string manipulation functions. Opposite to the previous paragraph here, we don’t insert the characters but remove them.
OK, at this point, all people who good at Excel would say the following sentence: Let’s cut the characters from the front and back. And what if we have to do this from the middle of the text or starting from the fifth character? Probably this is the moment when the average user would throw the keyboard from the desk…
Please try the remove by position function; it is a very effective working tool. Look through the list, and your blood pressure will quickly be restored.
Remove spaces to clean data
The next thing to do is to remove the spaces from the text. This is not complicated; it simply can be done by using the Excel TRIM function. In this case, we’ve done nothing else then automatized this function. You don’t need to apply any formulas, just use the Remove Spaces and All Spaces command.
Remove special characters from a text
Finally, let’s see the special characters. We distinguish three different types. The first one is the non-printable characters, the second is the alpha characters, and the third one is the numerical data. The tools that can be seen in the picture are capable of removing only one type of character from the text.
Imagine how useful can be this Excel application if we have to cleanse a database containing phone numbers. The cells have numbers and other characters also.
Let’s say we need data that only can contain numbers. And the list of phone numbers five thousand rows. What do you think about how long would it take to remove the hyphens and other non-numerical characters? It takes too long, only thinking about it!
Use Excel and automated data cleaning functions! The execution time narrows down to seconds. It’s great!
Final Thoughts: Use Effective Data Cleansing Tools
Let’s summarize what have we learned today! We have introduced the most important text transformations and their field of use. The free data cleansing add-in will take a lot of weight off your shoulders. In case you would like to increase your knowledge of Excel to visit the well-known Excel forums.
There is no shame in learning from professionals. That’s what we have done in the early days. The Excel string manipulation function is a very large subject that needs time and attention. As we can see from the article, Excel offers many possibilities to improve data quality.
Some functions we made with VBA macros, you should check these out closely. We hope that with the help of the introduced data cleansing techniques, you will create a flawless analysis.
We have a goal to make a complex add-in for you that will save you a lot of time. Our plan is to implement even more data cleansing functions to the add-in in the future. We are working on it!
Additional Resources and Downloads
- Download DataXL free excel add-in
- Remove first character in Excel
- Consolidate multiple worksheets
- How to split Excel Worksheets
- Compare two columns in Excel
- String manipulation formulas