Home » SQL & PL/SQL » SQL & PL/SQL » String Extract using REGEXP_SUBSTR (Oracle, 12.2.0.1.0, Linux)
String Extract using REGEXP_SUBSTR [message #686739] Fri, 09 December 2022 09:01 Go to next message
Duane
Messages: 557
Registered: December 2002
Senior Member
Is there a way to use a comma ',' as a delimiter but not check for that delimiter within double quotations?

Or, do you just have to use a different delimiter (pipe |) if your data might contain commas.


select StringValue
  from (select replace(regexp_substr(StringValue, '[^,]+', 1, level), '"') StringValue
          from (select 'MA,"Hi John,","CO"' StringValue
                  from dual)
    connect by level <= length(StringValue) - length(replace(StringValue, ',')) + 1)

Output should be:
MA
Hi John,
CO
Re: String Extract using REGEXP_SUBSTR [message #686740 is a reply to message #686739] Fri, 09 December 2022 09:22 Go to previous messageGo to next message
Duane
Messages: 557
Registered: December 2002
Senior Member
This is what I need but it also includes two extra null rows after the results.

Anyone know how to remove those extra null rows without adding "where StringValue is not null"?


select StringValue
  from (select replace(regexp_substr(StringValue, '("[^"]*")|[^,]+', 1, level), '"') StringValue
          from (select 'MA,"Hi John,","CO","Ok, I just do not get it"' StringValue
                  from dual)
    connect by level <= length(StringValue) - length(replace(StringValue, ',')) + 1)


Output is:
MA
Hi John,
CO
Ok, I just do not get it
Null Row
Null Row
Re: String Extract using REGEXP_SUBSTR [message #686741 is a reply to message #686740] Fri, 09 December 2022 11:26 Go to previous messageGo to next message
Solomon Yakobson
Messages: 3273
Registered: January 2010
Location: Connecticut, USA
Senior Member
You need to provide more detail on string. I'll assume it is comma-separated list of elements where each element is optionally enclosed in double quotes. And enclosed in double quotes element itself can't contain double quotes. If so:

with data as (select 'MA,"Hi John,","CO"' StringValue from dual)
select  StringElementValue
  from  data,
        lateral(
                select  trim(both '"' from regexp_substr(StringValue,'("[^\"]+")|([^\,]+)',1,level)) StringElementValue
                  from  dual
                  connect by level <= regexp_count(StringValue,'("[^\"]+")|([^\,]+)')
               )
/

STRINGELEMENTVALUE
------------------
MA
Hi John,
CO

SQL>
SY.

[Updated on: Fri, 09 December 2022 11:28]

Report message to a moderator

Re: String Extract using REGEXP_SUBSTR [message #686742 is a reply to message #686741] Fri, 09 December 2022 13:30 Go to previous messageGo to next message
Duane
Messages: 557
Registered: December 2002
Senior Member
Hi Solomon,

Yes, it is this "I'll assume it is comma-separated list of elements where each element is optionally enclosed in double quotes."

The string is comma-separated and is optionally enclosed in double quotes.

Valid string:

'Yes,"No",SQL,"SQL"'

'SQL,ORAFaq,Solomon'

'"SQL","Regexp","Substr"'

[Updated on: Fri, 09 December 2022 13:30]

Report message to a moderator

Re: String Extract using REGEXP_SUBSTR [message #686748 is a reply to message #686742] Sat, 10 December 2022 12:08 Go to previous messageGo to next message
Solomon Yakobson
Messages: 3273
Registered: January 2010
Location: Connecticut, USA
Senior Member
This will take care of it including double quote escaping:

with data as (select 'M\"A\","Hi \"John\",","C\"O","\"Sam\""' StringValue from dual)
select  rownum,
        StringElementValue
  from  data,
        lateral(
                select  replace(
                                regexp_replace(
                                               regexp_substr(
                                                             StringValue,
                                                             --'("([^"]*?(\\")?)*?"|([^",]+?(\\")?[^\,]+))',
                                                             '("([^"]*?(\\")?)*?"|([^",]+?(\\")?)+)',
                                                             1,
                                                             level
                                                            ),
                                               '(^")?(.*?(\\"$)?)("$)?',
                                               '\2'
                                              ),
                                '\"',
                                '"'
                               ) StringElementValue
                  from  dual
                  connect by level <= regexp_count(
                                                   StringValue,
                                                   --'("([^"]*?(\\")?)*?"|([^",]+?(\\")?[^\,]+))'
                                                   '("([^"]*?(\\")?)*?"|([^",]+?(\\")?)+)'
                                                  )
               )
/

    ROWNUM STRINGELEMENTVALUE
---------- --------------------
         1 M"A"
         2 Hi "John",
         3 C"O
         4 "Sam"

SQL>
SY.
Re: String Extract using REGEXP_SUBSTR [message #686749 is a reply to message #686748] Sat, 10 December 2022 12:15 Go to previous messageGo to next message
Duane
Messages: 557
Registered: December 2002
Senior Member
Hi Solomon,

I don't need double quotations. A value within a string would either be within double quotations or not have any at all.

Valid strings would only be these combinations:


'Yes,"No",SQL,"SQL"'

'SQL,ORAFaq,Solomon'

'"SQL","Regexp","Substr"'
Re: String Extract using REGEXP_SUBSTR [message #686750 is a reply to message #686749] Sat, 10 December 2022 15:05 Go to previous message
Solomon Yakobson
Messages: 3273
Registered: January 2010
Location: Connecticut, USA
Senior Member
So use my first query:

with data as (
              select 'MA,"Hi John,","CO"' StringValue from dual union all
              select 'Yes,"No",SQL,"SQL"' from dual union all
              select 'SQL,ORAFaq,Solomon' from dual union all
              select '"SQL","Regexp","Substr"' from dual
             )
select  StringValue,
        StringElementValue
  from  data,
        lateral(
                select  trim(both '"' from regexp_substr(StringValue,'("[^\"]+")|([^\,]+)',1,level)) StringElementValue
                  from  dual
                  connect by level <= regexp_count(StringValue,'("[^\"]+")|([^\,]+)')
               )
/
STRINGVALUE             STRINGELEMENTVALUE
----------------------- --------------------
MA,"Hi John,","CO"      MA
MA,"Hi John,","CO"      Hi John,
MA,"Hi John,","CO"      CO
Yes,"No",SQL,"SQL"      Yes
Yes,"No",SQL,"SQL"      No
Yes,"No",SQL,"SQL"      SQL
Yes,"No",SQL,"SQL"      SQL
SQL,ORAFaq,Solomon      SQL
SQL,ORAFaq,Solomon      ORAFaq
SQL,ORAFaq,Solomon      Solomon
"SQL","Regexp","Substr" SQL
"SQL","Regexp","Substr" Regexp
"SQL","Regexp","Substr" Substr

13 rows selected.

SQL>
SY.
Previous Topic: Update difference between two dates in months
Next Topic: Months Calculation and update Statement
Goto Forum:
  


Current Time: Fri Apr 19 22:24:07 CDT 2024