A lot of times, I convert his queries into Excel tutorials on this site, as it could be helpful for my other readers as well. This is also one such tutorial. My friend called me last week with the following issue: There is address data in a column in Excel, and I want to identify/filter cells the where the address has duplicate text strings (words) in it. Here is the similar dataset in which he wanted to filter cells that have a duplicate text string in it (the ones with red arrows):
Now what makes this difficult is that there is no consistency in this data. Since this is a compilation of data set that has been manually created by sales reps, there can be variations in the dataset. Consider this:
Any text string could repeat in this dataset. For example, it could be the name of the area or the name of the city or both. The words are separated by a space character, and there is no consistency in whether the city name would be there after six words or eight words. There are thousands of records like this, and the need is to filter those records where there are any duplicate text strings.
After considering many options (such as text to columns and formulas), I finally decided to use VBA to get this done. So I created a custom VBA function (‘IdDuplicate’) to analyze these cells and give me TRUE if there is a duplicate word in the text string, and FALSE in case there are no repetitions (as shown below):
This custom function analyzes each word in the text string and checks how many times it occurs in the text. If the count is more than 1, it returns TRUE; else it returns FALSE. Also, it has been created to only count words more than three characters. Once I have the TRUE/FALSE data, I can easily filter all the records that are TRUE. Now let me show you how to do this in Excel.
VBA Code for the Custom Function
This is done by creating a custom function in VBA. This function can then be used as any other worksheet function in Excel. Here is the VBA code for it: Thanks Walter for suggesting a better approach to this code!
How to Use this VBA Code
Now that you have the VBA code, you need to place it in the backend of Excel, so that it can work as a regular worksheet function. Below are the steps to put the VBA code on the backend: Once you have the VBA code in the back end, you can use the function – ‘IdDuplicates’ as any other regular worksheet function. This function takes one single argument, which is the cell reference of the cell where you have the text. The result of the function is TRUE (if there are duplicate words in it) or FALSE (if there are no duplicates). Once you have this list of TRUE/FALSE, you can filter the ones with TRUE to get all the cells that have duplicate text strings in it. Note: I have created the code only to consider those words that are more than three characters long. This ensures that if there are 1, 2, or 3 character-long words (such as 12 A, K G M, or L D A) in the text string, these are ignored while counting the duplicates. If you want, you can easily change this in the code. This function will only be available in the workbook where you have copied the code in the module. In case you want this to be available in other workbooks as well, you need to copy and paste this code in those workbooks. Alternatively, you can also create an add-in (enabling which would make this function available in all the workbooks on your system). Also, remember to save this workbook in .xlsm extension (since it has a macro code in it).
Excel Advanced Filter – A Complete Guide with Examples. Dynamic Excel Filter Search Box – Extract Data as you Type. The Ultimate Guide to Find and Remove Duplicates in Excel. Separate First and Last Name in Excel (Split Names Using Formulas)
Example : test test, – False I have updated the code and now it’s shorter and easier to read Function IdDuplicates2(rng As Range) Dim StringtoAnalyze As Variant Dim i As Integer On Error Resume Next IdDuplicates2 = “False” StringtoAnalyze = Split(rng.Value, ” “) i = LBound(StringtoAnalyze) If UBound(StringtoAnalyze) – LBound(StringtoAnalyze) > 0 Then Do While i <= UBound(StringtoAnalyze) And IdDuplicates2 = “False” If StringtoAnalyze(i – 1) = StringtoAnalyze(i) Then IdDuplicates2 = “True” i = i + 1 Loop End If End Function Function IdDuplicates2(rng As Range) Dim StringtoAnalyze As Variant Dim i As Integer On Error Resume Next IdDuplicates2 = “False” StringtoAnalyze = Split(rng.Value, ” “) i = LBound(StringtoAnalyze) + 1 If (UBound(StringtoAnalyze) – LBound(StringtoAnalyze)) > 0 Then Do While i <= UBound(StringtoAnalyze) And IdDuplicates2 = “False” If StringtoAnalyze(i – 1) = StringtoAnalyze(i) Then IdDuplicates2 = “True” Exit Do End If i = i + 1 Loop End If End Function Function IdRepeated(rng As Range) As Boolean Dim StringtoAnalyze As Variant Dim i As Integer Dim j As Integer Const minWordLen As Integer = 4 On Error Resume Next IdRepeated = False StringtoAnalyze = Split(rng.Value, ” “) ‘ remove short words from array For i = (UBound(StringtoAnalyze) – 1) To LBound(StringtoAnalyze) Step -1 If Len(StringtoAnalyze(i)) < minWordLen Then For j = i To UBound(StringtoAnalyze) – 1 StringtoAnalyze(j) = StringtoAnalyze(j + 1) Next ReDim Preserve StringtoAnalyze(UBound(StringtoAnalyze) – 1) End If Next j = UBound(StringtoAnalyze) If (Len(StringtoAnalyze(j)) < minWordLen) Then ReDim Preserve StringtoAnalyze(j – 1) End If ’ Look for repeated words For i = LBound(StringtoAnalyze) To UBound(StringtoAnalyze) – 1 For j = i + 1 To UBound(StringtoAnalyze) If (StringtoAnalyze(i) = StringtoAnalyze(j)) Then IdRepeated = True Exit Function End If Next Next End Function Also made a change to identify words even if in different cases. Function IdDuplicates(rng As Range) As String Dim StringtoAnalyze As Variant Dim i As Integer Dim j As Integer Const minWordLen As Integer = 4 StringtoAnalyze = Split(UCase(rng.Value), ” “) For i = UBound(StringtoAnalyze) To 0 Step -1 If Len(StringtoAnalyze(i)) < minWordLen Then GoTo SkipA For j = 0 To i - 1 If StringtoAnalyze(j) = StringtoAnalyze(i) Then IdDuplicates = “TRUE” GoTo SkipB End If Next j SkipA: Next i IdDuplicates = “FALSE” SkipB: End Function Will update on the site soon.