Substring with Concatenation of All Words

hard · strings, sliding-window

Substring with Concatenation of All Words

Given a string s and a list of words words, return all starting indices of substrings in s that are a concatenation of every word exactly once (in any order) with no intervening characters. All words have the same length.

Brute force checks each start with O(n * k) scanning. A sliding window by word length can do this in near-linear time.

Function signature

func FindSubstring(s string, words []string) []int

Inputs / outputs

s: the search string.
words: list of words (all same length).
Return: starting indices of valid concatenations.

Example

s = "barfoothefoobarman"
words = ["foo","bar"]
Output: [0,9]

Constraints

0 <= len(s) <= 100_000
0 <= len(words) <= 5000
All words have equal length.

Core idea (near-solution)

Treat s as a sequence of fixed-length words (size wordLen). Use a sliding window that moves in steps of wordLen.

For each offset in [0, wordLen-1]:

Maintain a count map for the current window.
If a word appears too many times, move the left side forward by one word at a time.
When the window contains exactly len(words) words with valid counts, record the start.

Invariant: the window always contains whole words (aligned to wordLen), and counts never exceed the required frequency for any word.

Algorithm (step-by-step)

If words is empty, return empty.
Compute wordLen, wordCount, and total length totalLen = wordLen * wordCount.
Build a frequency map need for words.
For each offset from 0 to wordLen-1:
- Set left = offset, count = 0, and clear window map.
- For right from offset to len(s)-wordLen in steps of wordLen:
  - Read word = s[right:right+wordLen].
  - If word not in need: reset window and move left = right + wordLen.
  - Else increment its count; if count exceeds need, shrink from left until valid.
  - If count == wordCount, record left and then move left by one word.
Return all recorded indices.

Correctness sketch

Invariant: the window always contains whole words (aligned to wordLen), and counts never exceed the required frequency for any word.
The window state exactly matches the elements in s[l..r].
Shrinking/expanding preserves validity, so any recorded window is correct.
The best recorded window is optimal because all candidate windows are considered.

Trace (hand walkthrough)

Trace Walkthrough

Example input:
s = "barfoothefoobarman"
words = ["foo","bar"]
Output: [0,9]

Walk:
- wordLen=3, totalLen=6
- index 0: "barfoo" -> {bar,foo} matches -> add 0
- index 3: "foothe" no; index 6: "thefoo" no
- index 9: "foobar" matches -> add 9

Pitfalls to avoid

Sliding by 1 character instead of wordLen.
Forgetting to reset when encountering a word not in need.
Not handling duplicate words in words.

Complexity

Time: O(n) average with hash maps and fixed word length.
Extra space: O(k) for counts of distinct words.

Implementation notes (Go)

Slide in steps of wordLen and maintain counts in a map.
Move the left pointer in word-sized steps when a count exceeds its limit.

Auto-advance

Run tests to see results

No issues detected