Substring with Concatenation of All Words
Substring with Concatenation of All Words
Given a string s and a list of words words, return all starting indices of substrings in s that are a concatenation of every word exactly once (in any order) with no intervening characters. All words have the same length.
Brute force checks each start with O(n * k) scanning. A sliding window by word length can do this in near-linear time.
Function signature
func FindSubstring(s string, words []string) []int
Inputs / outputs
s: the search string.words: list of words (all same length).- Return: starting indices of valid concatenations.
Example
s = "barfoothefoobarman"
words = ["foo","bar"]
Output: [0,9]
Constraints
0 <= len(s) <= 100_0000 <= len(words) <= 5000- All words have equal length.
Core idea (near-solution)
Treat s as a sequence of fixed-length words (size wordLen). Use a sliding window that moves in steps of wordLen.
For each offset in [0, wordLen-1]:
- Maintain a count map for the current window.
- If a word appears too many times, move the left side forward by one word at a time.
- When the window contains exactly
len(words)words with valid counts, record the start.
Invariant: the window always contains whole words (aligned to wordLen), and counts never exceed the required frequency for any word.
Algorithm (step-by-step)
- If
wordsis empty, return empty. - Compute
wordLen,wordCount, and total lengthtotalLen = wordLen * wordCount. - Build a frequency map
needforwords. - For each
offsetfrom 0 towordLen-1:- Set
left = offset,count = 0, and clearwindowmap. - For
rightfromoffsettolen(s)-wordLenin steps ofwordLen:- Read
word = s[right:right+wordLen]. - If
wordnot inneed: reset window and moveleft = right + wordLen. - Else increment its count; if count exceeds need, shrink from left until valid.
- If
count == wordCount, recordleftand then move left by one word.
- Read
- Set
- Return all recorded indices.
Correctness sketch
- Invariant: the window always contains whole words (aligned to
wordLen), and counts never exceed the required frequency for any word. - The window state exactly matches the elements in
s[l..r]. - Shrinking/expanding preserves validity, so any recorded window is correct.
- The best recorded window is optimal because all candidate windows are considered.
Trace (hand walkthrough)
Trace Walkthrough
Example input:
s = "barfoothefoobarman"
words = ["foo","bar"]
Output: [0,9]
Walk:
- wordLen=3, totalLen=6
- index 0: "barfoo" -> {bar,foo} matches -> add 0
- index 3: "foothe" no; index 6: "thefoo" no
- index 9: "foobar" matches -> add 9
Pitfalls to avoid
- Sliding by 1 character instead of
wordLen. - Forgetting to reset when encountering a word not in
need. - Not handling duplicate words in
words.
Complexity
- Time:
O(n)average with hash maps and fixed word length. - Extra space:
O(k)for counts of distinct words.
Implementation notes (Go)
- Slide in steps of
wordLenand maintain counts in a map. - Move the left pointer in word-sized steps when a count exceeds its limit.
Run tests to see results
No issues detected