To search a specific char or substring in a string, we have the indexOf()
function :
fun main() {
val string = "GrandMaster"
println(string.indexOf('m', ignoreCase = true)) // Prints 5
println(string.indexOf("master", ignoreCase = true)) // Prints 5
}
indexOf()
functions either returns index of first occurrence of the query (if found) or -1 (if not found).
Lets implement these functions on our own.
indexOf(char)
indexOf(char)
Looking for a character is straight forward. We just have to iterate over the string and check if char
is equal to string[index]
. If found equal, we return the index. If not found, loop is exited and we return null.
Implementation
Implementation
fun String.indexOf(char: Char, ignoreCase: Boolean = false): Int? {
// Iterate
forEachIndexed { index, c ->
// If found, return index
if (c.equals(char, ignoreCase)) return index
}
// If we reach here, then char not found
return null
}
indexOf(string)
indexOf(string)
Finding a substring requires multiple subsequent comparisons depending on the length of query.
For example, to search for the query ili
in the string ability
, we have to check entire substrings of length of 3 at each occurence of i
in the string i.e. at index range 2..4
and 4..6
.
There are multiple ways to approach this problem. Lets understand the simplest but expensive approach first and then keep improving the approach to make it more efficient.
Using windows
Using windows
The most naive approach is to check query against all substrings of the string. To get all substrings of a string of given length, we can use the windowed()
function.
fun String.indexOfUsingWindowed(query: String, ignoreCase: Boolean = false): Int? {
windowed(query.length).forEachIndexed { index, substring ->
if (substring.equals(query, ignoreCase)) return index
}
return null
}
Lets analyse the number of comparisons made by this approach.
Let n
and m
be the length of string and query respectively where n≥m
. Number of substrings of length m
in a string of length n
is n-m+1
. In the worst case, for each substring we perform m
comparisons against the query. In total, we perform (n-m+1) * m
= m(n+1) - m^2
comparisons.
Example :
Input = "Ability", Query = "lit"
No. of substrings of length 3 in Input = 7 - 3 + 1 = 5
Substrings of length 3 = ["Abi", "bil", "ili", "lit", "ity"]
Worst case comparisons = 5 * 3 = 15
Can we perform better than this? Of course we can! Instead of comparing each and every substring, its better to compare only the substrings that have the same first character as that of query.
Using substring
Using substring
Instead of getting all the substrings using windowed()
function, we are interested only in those that start with the first character of query.
Algorithm
Algorithm
To find out and compare against those substrings, we follow the algorithm :
Iterate over the string and for each character,
- If equal to query.first(),
- find the possible end of query in the string by index + query.length
- if possibleEnd ≤ string.length, get the substring string[index until possibleEnd] & compare with query
- if found equal, return index
- If not found, exit the loop and return null
Example
Example
Input = "Ability", Query = "ity"
Iterating over the input string :
[0] = 'A' != 'i' -> continue
[1] = 'b' != 'i' -> continue
[2] = 'i' == 'i' -> [2 until 5] = "ili" != "ity" -> continue
[3] = 'l' != 'i' -> continue
[4] = 'i' == 'i' -> [4 until 7] = "ity" == "ity" -> return 4
Number of comparisons = 7 + 3 = 10
Implementation
Implementation
fun String.indexOfUsingSubstringAlt(query: String, ignoreCase: Boolean = false): Int? {
forEachIndexed { index, c ->
// If matches to first char of query
if (c.equals(query.first(), ignoreCase)) {
// Evaluate possible end of query in string
val possibleEnd = index + query.length
// Check only if in bounds
if (possibleEnd <= length) {
if (query.equals(substring(index, possibleEnd), ignoreCase)) {
return index
}
}
}
}
// If we reach here, then query not found
return null
}
Through this algorithm, we have reduced the number of comparisons. We proceed to find and compare the substring only if first characters match.
The total number of comparisons depend on the number of occurrence of first character of query in string. However in best case, we just need to make n+m
comparisons.
There is one more problem though. This approach requires extracting the substring of query length even if only the first characters match and rest of the characters do not. In the above Ability
example, extracting out the [2 until 5]
substring ili
was an overhead because only first characters i.e. i
match and rest do not when comparing with query ity
. This makes it memory inefficient.
Can we further improve our program? Yes, we can! Looping over the string instead of using windowed()
function, we gained control over which substrings to extract and compare. Now we need control over substring matching process, so that we don’t extract the substring.
Using pointers
Using pointers
In this approach, we won’t be using the substring() function because it led us to extract out the entire substring even on partial matches. Instead we will use pointers here.
The string loop variable index can be thought of as a pointer. On each iteration it points to a character in the string. To keep track of which characters to match, we need two pointers (variables) - one for input string i
and one for query j
. Both will start from 0
. i
will be incremented on each iteration but j
will be incremented based on the comparison result.
Algorithm
Algorithm
- Initialize pointer
j
to0
- For each character
c
at indexi
in input string,- if
c
equalsquery[j]
, incrementj
. One character is matched, we need to match the next character of query on next iteration. - else (found unequal) and if
j>0
, resetj
to0
.j>0
indicates we have matched 1 or more characters of query, but now unequal characters are found, so we have to restart matching query. - if
j==query.length
, then return the index(i - query.length + 1)
.j
pointing toquery.length
indicates all characters of query being matched. We simply return the start of substring as the index.
- if
- if we exit the loop, return null (not found)
Example
Example
Input = "Ramanujan", Query = "anuj"
i=0, j=0
'R' != 'a' -> i=1
'a' == 'a' -> i=2, j=1
'm' != 'n' -> i=3, j=0
'a' == 'a' -> i=4, j=1
'n' == 'n' -> i=5, j=2
'u' == 'u' -> i=6, j=3
'j' == 'j' -> i=7, j=4
j == Query.length -> return (7-4+1) = 4
Implementation
Implementation
fun String.indexOf(query: String, ignoreCase: Boolean = false): Int? {
// Pointer for query
var j = 0
// Iterate over this string
forEachIndexed { i, c ->
// If chars at both pointers are equal (partial match), increment query pointer
if (c.equals(query[j], ignoreCase)) j++
// If unequal and already matched >1 chars, restart
else if (j > 0) j = 0
// If all chars of query matched, return true
if (j == query.length) return (i - query.length + 1)
}
// If we reach here, then query not found
return null
}
This approach is much better than the previous two. It doesn’t require forming substrings. It simply uses iteration over input string. The best part is - it requires only n
comparisons in the worst case (n
being length of input string).