I saw a post on one of my coding boards talking about a solution for an ‘interview-style’ problem.
The issue was to find the longest common substring.
What this means is: Given the word “abracadabra” you are looking for the longest repeated set of characters.
The longest repeated sub-string is “abra” which appears at positions 0 and 7 (counting from 0).
The solution the person presented involved putting the strings from smallest to largest into a dictionary (hash table) with counts of iterations.
a-4
b-2
ab-2
And then walking the tree to find the largest result. This solution seems odd to me.
My solution is to make a dynamic array that can be queried for contents. Further, doing it from longest to shortest:
Edit gwenix points out a string can’t be repeated until it’s no more than half the length. So start from longest string (len /2) rounded down.
End Edit
Solution:
Get longest string: abracadabra
Is string in array? YES? got answer and bail; No –> add to array
*Reduce look string by 1 character
Move to start of target string:
**Get look string: abracadabr
Is string in array? YES? got answer and bail; No –> add to array
Is look string at end of target string? Yes –> loop back to *; No –> move forward one character and loop back to **
This is massively not optimised for time or processing. But it’s a first stab.
Also, just pseudo code… loop counters would be my next pass.
Comments? Improvements?
Seems to me that a string can’t be repeated until the halfway point, so why not reduce by half to start?
Oooh, good point.
You’re hired. Well, if was hiring. Good addition.
Re: Oooh, good point.
Hey, you and I did work well together before!
Re: Oooh, good point.
Agreed!
Maybe I don’t understand, but …
Why can’t a string be repeated until the halfway point? How about a string that is repeated within the first half? Example: insinuate.
Re: Maybe I don’t understand, but …
This takes into account counting from the largest string to the shortest string.
So, going back to the example of “abracadabra”
We can’t look for duplications of “abraca” which is 6 of 10 characters long because we’d need to move to the next position to compare it. However, there are only 4 characters left, so we won’t find that string again.
As a result, we start from the largest duplicatable string, which in 10 characters would be 5.
Now theoretically if you had a string like “AAAAAA” and looked for “AAAAA” that does exist twice. So I’d have to throw that back to; if that means I should go back to my initial solution 😉
Re: Maybe I don’t understand, but …
Oh, good point. I obviously didn’t think of overlap.
Re: Maybe I don’t understand, but …
I didn’t mean that the repeat would be halfway, but without overlap, the longest string that can repeat can be no more than half the number of characters. “Insinuate” has a total of 9 characters, but the repeating string totals 2 characters — less than half the number of characters.
But as pointed out later, overlap means that my contribution is shot.