Regexp_replace: longest repetitive substring in a string

is it possible to write a regexp_replace to replace longest consecutive substring that repats itself in a string with just once instance of this substring? for example:
1,[2],[2],3 -> 1,[2],3
1,[2,3],[2,3],4 -> 1,[2,3],4
1,2,3,4,[2,3,4,5],[2,3,4,5],6 -> 1,2,3,4,[2,3,4,5],6if there were a few equivalent cases a first one would be picked:
1,[2,3],[2,3],4,5,4,5,6 -> 1,[2,3],4,5,4,5,6I would appreciate any help and suggestions if this is doable.
thank you
create table t( s varchar2(50) );
insert into t values('1,2,2,3');
insert into t values('1,2,3,2,3,4');
insert into t values('1,2,3,4,2,3,4,5,2,3,4,5,6');
insert into t values('1,2,3,2,3,4,5,4,5,6');

Interesting problem!
943276 wrote:
is it possible to write a regexp_replace to replace longest consecutive substring that repats itself in a string with just once instance of this substring?Another way of saying that would be "Find the shortest string s1 such that s1 is s with a repating substring replaced by just one instance of itself." If we can generate all possible s1's, then we just need to find the shortest one (or one of the shortest, in case of a tie).
create table t( s varchar2(50) );
insert into t values('1,2,2,3'); ...Thanks for posting the CREATE TABLE and INSERT statements; that's very helpful.
Don't forget to say which version of Oracle you're using. The query below works in Oracle 10.1 and up.
Here's one way in pure SQL:
WITH     cntr     AS
     SELECT     LEVEL     AS n
     FROM     (
               SELECT     MAX (LENGTH (s))     AS max_length_s
               FROM     t
     CONNECT BY     LEVEL     < max_length_s
,     got_new_s     AS
     SELECT     t.s
     ,     SUBSTR ( t.s
                 , 1
                 , c.n - 1
                 ) || REGEXP_REPLACE ( SUBSTR (t.s, c.n)
                                         , '^(.+)\1'
                                   , '\1'
                                   )     AS new_s
     ,     c.n
     FROM     t
     JOIN     cntr     c  ON  c.n < LENGTH (t.s)
,       MIN (new_s) KEEP ( DENSE_RANK FIRST ORDER BY  LENGTH (new_s)
                                                          ,   n
                     )  AS shortest
FROM       got_new_s
;The first sub-query is a Counter Table . that just produces the integers 1, 2, 3, ... counting fromn 1 to the length of the longest string, minus 1.
Got_new_s is the interesting part. In got_new_s, we see if the substring starting at position n starts with a repeating pattern, and, if so, replace it. By combining all possible values of n to s, we get all possible replacements.
The main query is just a matter of finding the shortest of those replacements.
As Dan said, this really a job for PL/SQL. It would probebly be easier to maintain as well as more efficient in PL/SQL.
Edited by: Frank Kulash on Jun 30, 2012 4:07 AM
Added c.n to SELECT clause of got_new_s, after Paulie (below).
S                         SHORTEST
1,2,2,3                   1,2,3
1,2,3,2,3,4               1,2,3,4
1,2,3,2,3,4,5,4,5,6       1,2,3,4,5,4,5,6
1,2,3,4,2,3,4,5,2,3,4,5,6 1,2,3,4,2,3,4,5,6

