Substring with Concatenation of All Words

hard · strings, sliding-window

Substring with Concatenation of All Words

Given a string s and a list of words words, return all starting indices of substrings in s that are a concatenation of every word exactly once (in any order) with no intervening characters. All words have the same length.

Brute force checks each start with O(n * k) scanning. A sliding window by word length can do this in near-linear time.

Function signature

func FindSubstring(s string, words []string) []int

Inputs / outputs

  • s: the search string.
  • words: list of words (all same length).
  • Return: starting indices of valid concatenations.

Example

s = "barfoothefoobarman"
words = ["foo","bar"]
Output: [0,9]

Constraints

  • 0 <= len(s) <= 100_000
  • 0 <= len(words) <= 5000
  • All words have equal length.

Core idea (near-solution)

Treat s as a sequence of fixed-length words (size wordLen). Use a sliding window that moves in steps of wordLen.

For each offset in [0, wordLen-1]:

  • Maintain a count map for the current window.
  • If a word appears too many times, move the left side forward by one word at a time.
  • When the window contains exactly len(words) words with valid counts, record the start.

Invariant: the window always contains whole words (aligned to wordLen), and counts never exceed the required frequency for any word.

Algorithm (step-by-step)

  1. If words is empty, return empty.
  2. Compute wordLen, wordCount, and total length totalLen = wordLen * wordCount.
  3. Build a frequency map need for words.
  4. For each offset from 0 to wordLen-1:
    • Set left = offset, count = 0, and clear window map.
    • For right from offset to len(s)-wordLen in steps of wordLen:
      • Read word = s[right:right+wordLen].
      • If word not in need: reset window and move left = right + wordLen.
      • Else increment its count; if count exceeds need, shrink from left until valid.
      • If count == wordCount, record left and then move left by one word.
  5. Return all recorded indices.

Correctness sketch

  • Invariant: the window always contains whole words (aligned to wordLen), and counts never exceed the required frequency for any word.
  • The window state exactly matches the elements in s[l..r].
  • Shrinking/expanding preserves validity, so any recorded window is correct.
  • The best recorded window is optimal because all candidate windows are considered.

Trace (hand walkthrough)

Trace Walkthrough
Example input:
s = "barfoothefoobarman"
words = ["foo","bar"]
Output: [0,9]

Walk:
- wordLen=3, totalLen=6
- index 0: "barfoo" -> {bar,foo} matches -> add 0
- index 3: "foothe" no; index 6: "thefoo" no
- index 9: "foobar" matches -> add 9

Pitfalls to avoid

  • Sliding by 1 character instead of wordLen.
  • Forgetting to reset when encountering a word not in need.
  • Not handling duplicate words in words.

Complexity

  • Time: O(n) average with hash maps and fixed word length.
  • Extra space: O(k) for counts of distinct words.

Implementation notes (Go)

  • Slide in steps of wordLen and maintain counts in a map.
  • Move the left pointer in word-sized steps when a count exceeds its limit.
Run tests to see results
No issues detected