How to Clean Data in Excel

Our definitive guide will show you 15 ways to clean data in Excel. Learn more about data cleansing through useful examples.

Clean data and string manipulation formulas in Excel are critical! In addition to the well-known techniques, we will introduce special VBA codes to make the process quicker and more effective.

Table of contents:

  1. Sorting data to check extremes and peaks
  2. Remove duplicates to clean data
  3. Use the find and replace function
  4. Check the type of data in a cell
  5. Convert numbers stored as text into numbers
  6. Eliminate blank cells in a list or range
  7. Clean data by splitting the text into columns
  8. Concatenate text using the TEXTJOIN function
  9. Change text to lower–upper–proper case
  10. Remove non-printable characters using the CLEAN formula
  11. Remove unnecessary characters from text automatically
  12. Clean extra spaces using the TRIM function
  13. Insert text after specific nth characters
  14. Delete the text after the nth character
  15. Remove special characters from a text
  16. Regular Expressions

Introduction to Data Cleansing in Excel

Data cleansing is Excel, not just a task; it is an art that turns raw data into a masterpiece of insights and information. The path to learning this art begins with understanding the foundation: Excel. Our guide does not just walk you through the fundamental steps; it searches into the lesser-known methods, revealing 15 effective ways to clean your data. We introduce a world where Excel’s string manipulation formulas and VBA codes become your friends in the battle against data clutter. We will show you special VBA codes that accelerate the cleansing process, making it effective and efficient.

Imagine the satisfaction of filtering through millions of data points to identify the outliers or the aid when you remove duplicates with a simple click, ensuring a clean dataset. We will explain multiple scenarios with practical examples, demonstrating the power of Excel’s functions – like ‘Sort’, ‘Text to Columns’, ‘TRIM’, and the ‘CLEAN’ formula.

But the tutorial is not just about cleaning what is visible. It goes beyond addressing the invisible threats like non-printable characters and excess spaces that lurk within your sheets, waiting to undermine your analysis. Moreover, we tell a personal story and provide tips that make the journey through data cleansing a learning adventure. It is about transforming a boring task into an opportunity to explore the full potential of your data.

Finally, we introduce our Excel add-in, DataXL, transforming complex string manipulations into child’s play.

Effective ways to clean data in Excel

Let us see the basics first! Sometimes, the best method is to go “back to the roots” and start the data cleansing project with simple but effective solutions.

Prologue

Before we take a deep dive into this topic, allow us to tell a personal story.

After college, I worked at a small company with limited HR resources. As newbies, they didn’t dish out the easy tasks for me. Okay, let’s clean the data in Excel. I got countless boring tasks. The main issue was how to create clean data just in time. I had never heard of the Excel string manipulation functions, so the solution wasn’t simple – many night shifts; good, old days.

After that, we formed our own company and worked on various plans. Finally, we had enough resources to develop a powerful Excel add-in for data cleansing.

clean-data-in-excel-add-in

We aim to make text manipulation and Excel data processing a child’s play for you. So today, we publish a universal Excel add-in tool. From now on, string manipulation and data cleansing will be easy! Read more about the utility.

#1 – Sorting data to check extremes and peaks

Is it possible to find errors in a list using simple sorting? Yes! We have nothing to do but arrange the data in a growing or decreasing order. Right-click on the cell and choose the ‘Sort Largest To Smallest’ option from the menu. Seldom can there be found extremely small or large, maybe peeking data?

Look at the picture below:

clean data example 1 peaks

Who would ever think you could find these mistakes in several million records? Alternatively, conditional formatting can find errors and blanks in a range.

#2 – Remove duplicates to clean data

Excel supports many options to eliminate duplicates. For example, two tightly joined operations filter unique values and remove duplicated data. The result is the same in both cases: a list of unique values.

To filter for unique values (filtered list), use the Advanced command on the Data tab in the Sort & Filter group.

sort and filter group data

Select the range of cells, then go to the Data tab. In the Sort & Filter group, click Advanced. Next, select the Unique Records Only check box and click OK.

clean data unique records only

Explanation: However, there is one significant difference that we cannot ignore. In the process of filtering unique values, it temporarily hides the duplicated values. You can go back to your original list to undo this operation.

You’ll permanently delete duplicates using the ‘Remove Duplicates’ tool on the ribbon. Pay attention to this! The removed items will last after saving the Workbook, so be very careful.

Let’s see how the duplicate removal values tool works! First, select the range. Then, go to the Data tab! In the Data Tools group, click Remove Duplicates.

remove duplicates quickly

Select one or more columns, then click OK. A message box will appear, showing how many duplicated values are removed and how many unique ones remain.

removed values popup

Finally, read this definitive guide to learn more about removing duplicates.

#3 – Use the find and replace function

The advantage of the Find & Replace function is that we can work with it relatively fast in any size data table. First, let’s see how to clean data using this function. To find and replace data in a worksheet, follow these steps:

Press the Ctrl+H keyboard shortcut, and the Find and Replace dialog box appears.

In the Find What box, enter the data you want to locate. Then, enter the Replace With box with the data you want to replace.

Use the find and replace function

If you want to replace all occurrences simultaneously, click Replace All.

#4 – Check the Type of Data in a Cell

Please wait a minute before we make any changes to the raw data.

We have to check the type of data in the cell. We can do this by using the TYPE function, which is one of a group of information functions. Is it numeric or text data? In the picture below, you can see what kind of results the formula brings for different types of data.

If the function returns 1, the data type is numeric. If it’s returning with 2, the data type is text.

Check the Type of Data in a Cell to clean data

#5 – Convert Numbers Stored as Text into Numbers

If we import data from a text file or an external data source into Excel, Excel often stores numbers as text. It can be the source of many problems, but there are ways to avoid mistakes.

How to convert numbers stored as text into numbers?

Create a helper column and type 1. Next, apply a simple multiply using this formula: number (stored as text) * 1. A result is a number.

Convert Numbers Stored as Text into Numbers

Read more about how to convert text to numbers.

#6- Eliminate blank cells in a list or range

Now, we’ll highlight the whole range containing the empty lines. Press the F5 key and choose the ‘Special’ option from the popup window.

All the cells in the selected range that are not blank are deselected, leaving only the blank cells selected.

Eliminate blank cells in a list or range

Click ‘Delete‘ in the ‘Cells’ section of the ‘Home‘ tab.

Finally, select ‘Delete Sheet Rows‘ from the drop-down menu.

#7 – Split text into columns

It’s easy to split text into multiple cells. For example, suppose that our data set contains our clients’ first and family names in one cell. We want to split the names so that the first and family names are in different columns.

Start the ‘Convert Text to Columns Wizard’ and follow the steps below:

First, select the range that contains names. Next, click Text to Columns in the Data Tools group.

text to columns

Select the Delimiters checkbox and click Next.

click next

Under the Delimiters section, leave all the checkboxes empty. Now check the Space check box (in our example). Click Finish!

split names final step

#8 – Concatenate text using the TEXTJOIN function

Let’s see the opposite of the previous section. Now, we want to combine the text from several cells into one. How can we solve this problem quickly?

The TEXTJOIN function joins (what a surprise) values using a selected delimiter. It provides better functionality than CONCAT. This function allows you to supply a range of cells and has a setting to ignore empty values.

Take a look at the formula bar! You can use space as a delimiter. The second parameter is TRUE because we want to ignore the empty cells (if available).

Concatenate text using the TEXTJOIN function

If you are using an earlier version of Excel, you should install 3rd party add-in.

#9 – Change Text to Lower – Upper – Proper Case

Let’s see what kind of Excel text transformation formulas we can use to transform the text. It is worth using them during data cleansing.

  • LOWER() transforms the given text into lowercase letters. Example: LOWER(‘EXCEL’), result: ‘excel’
  • UPPER() changes the given text into upper case letters. Example: UPPER(‘excel’), result: ‘EXCEL’
  • PROPER() transforms the first letter of the character string and the letters standing after non-letters into upper case letters.

Example:

=PROPER(‘Information is BEAUTIFUL’)

clean data to change text into proper case

The result of the conversion is, ‘Information Is Beautiful. ‘ We might also use these kinds of transformations, but not too often.

#10 – Remove non-printable characters – The CLEAN formula

Removing non-printable characters is one of the most advanced parts of data cleansing! We can say that work is relatively complex with special text formulas. We have made some short VBA codes for several possibilities.

The non-printable Unicode characters are 129, 141, 143, 144, and 157.

Let’s see how it works with the help of the following example. The text in cell B3 contains non-printable characters. So, first, you can check the result using LEN() formula. Then, to remove the unnecessary text parts, use the CLEAN function.

remove non-printable characters

#11 – Remove unnecessary characters from text automatically

If we only want to delete special characters, the following tool can help. Download our add-in to remove alpha, numeric, or non-numeric characters in seconds.

Remove unnecessary characters from text using macros

#12 – Clean extra spaces using the TRIM function

Removing extra spaces from the text needs more attention.

You can find spaces at the text’s beginning, middle, or end. The letters are the hardest to recognize because they are invisible to the naked eye.

We’ll explain the two reasons they have to be removed. First, they can cause mistakes in formula use. Imagine, for example, that the result of the VLOOKUP() formula is incorrect. It may sound funny, but it is not. Some formula mistakes may influence the outcome of the final calculation.

Clean extra spaces using the TRIM function

The other reason is the file size. However, the one-character surplus is not baneful. Problems occur when the Excel database contains several million records.

Yes, there are some like this. Just think about the external data sources and Power Query output. In this case, the size increase can be 5% – 10%. We don’t have to depict the harmful effects of this. Use the TRIM() formula and guarantee that the list will be clean.

#13 – Insert text after specific nth characters

Using some Excel functions, we can resolve that we insert some characters in front of or behind a given string. The problem starts when we perform this operation on an Excel database.

By choosing the ‘ Text Tools ‘ option, we can manipulate massive data sets in seconds! However, executing this task manually is challenging if you use simple Excel formulas.

Insert text after specific nth characters

From the Excel string manipulation formulas, we have to emphasize the SEARCH(), FIND(), MID(), LEFT(), and RIGHT() functions.

Imagine that comfortable situation where you don’t have to create endless formulas again! Instead, start the functions that you can see in the picture.

#14 – Delete the text after the nth character

The following data cleansing task is a real threat to Excel users. Let’s stay a little longer with the string manipulation functions. Opposite to the previous paragraph, we don’t insert the characters but remove them.

At this point, all good Excel users would say, “Let’s cut the characters from the front and back.” What if we do this from the middle of the text or starting from the fifth character?

This is probably the moment when the average user throws the keyboard from the desk.

Delete the text after the nth character

Please try the remove-by-position function; it is a very effective working tool. Then, look through the list. The result will quickly restore your blood pressure.

#15 – Remove special characters from a text

Finally, let’s see the special characters. We distinguish three different types. The first is the non-printable characters, the second is the alpha characters, and the third is the numerical data. The tools seen in the picture can remove only one type of character from the text.

Imagine how useful this Excel application would be if we had to cleanse a database containing phone numbers. The cells have numbers and other characters.

Remove special characters from a text

Let’s say we need data that only can contain numbers. And the list of phone numbers has five thousand rows. What do you think about how long would it take to remove the hyphens and other non-numerical characters? It takes too long, only thinking about it! Use Excel and automated data-cleaning functions! The execution time narrows down to seconds. It looks great!

#16 – Regular Expressions

Meet the new regex functions in Excel!

Final Thoughts: Use Effective Methods to Clean Data in Excel

Let’s summarize what we have learned today! We have introduced the most important text transformations and their field of use. The free data cleansing add-in will take a lot of weight off your shoulders. To increase your knowledge of Excel, visit the well-known Excel forums.

There is no shame in learning from professionals. That’s what we did in the early days. However, Excel string manipulation functions are a huge subject that needs time and attention. As the article shows, Excel offers many possibilities to improve data quality.

We created some functions with VBA macros; you should check these out closely. We hope you will create a flawless analysis with the help of the data cleansing techniques that were introduced.

Your Path to Becoming a Data Cleansing Expert

As you embark on this journey, remember that the road to mastering data cleansing in Excel is both rewarding and continuous. The skills you develop will improve your immediate data analysis tasks and set a foundation for advanced data management techniques. We encourage you to leverage these case studies and sample projects as stepping stones toward becoming a confident and competent data analyst.

Additional resources: