Home » SQL & PL/SQL » SQL & PL/SQL » RegEX data parsing (12.1)
RegEX data parsing [message #686652] Thu, 10 November 2022 10:36 Go to next message
lott42_gmail
Messages: 146
Registered: June 2010
Senior Member
I'm not good with Reg Ex parsing but here's one way I accomplished my task. Is there an easier way to parse data within a string?

--
--1: Start: Need to find either the string "ROOT" + "any 1 char" + "CAUSE:" OR "ROOT" + "any 1 char" + "CAUSE" + "any 1 char" + "ASSESSMENT:" 
--2: End:   Then find the first number+period+left paren. ex: "3.(" after #1. There's a few more possible "End" positions like 2.( OR 2. ( OR 2). OR 2) . OR 2. )
--3: Data:  Parse the data between #1 and #2(if NO #2 "End") was found then get rest of string
--
--Sample data
WITH data AS (
SELECT 1 id, '1.(U) ROOT CAUSE: Root cause data1. other data' str FROM DUAL union all
SELECT 2 id, '1.(U) ROOT*CAUSE ASSESSMENT: Root cause data2   3.(U) other data' str FROM DUAL UNION ALL
SELECT 3 id, '1.(U) ROOT-CAUSE: Root cause data3   2.(U) other data' str FROM DUAL UNION ALL
SELECT 9 id, '1.(U) TEST STRING: Root cause data9 5.(U)' str FROM DUAL
)
SELECT parsed_data.str,
       --#3 Parse the data between #1 and #2
       TRIM (
           SUBSTR (
               str,
               start_pos,
               CASE
                   --if NO end anchor was found, then get the rest of input string
                   WHEN end_numb_of_chars = 0 THEN LENGTH (str)
                   ELSE end_numb_of_chars - 1
               END))    root_parsed_data,
       parsed_data.start_pos,
       parsed_data.end_numb_of_chars
  FROM (SELECT data_location.*,
               root_data_start_pos + root_data_str_len    start_pos,
               --#2 Find the first number+period+left paren after #1
               REGEXP_INSTR (
                   SUBSTR (str,
                           --start looking after "root" data was found
                           root_data_start_pos + root_data_str_len,
                           --grab the rest of the string
                           LENGTH (str)),
                   --Need to find "end" position anchor
                   --2.( or 2. ( or 2). or 2) . or 2. )
                   '\d+\.\(|\d+\. \(|\d+\(\.|\d+\)\.')    end_numb_of_chars
          FROM (SELECT d.*,
                       --#1: look for either "ROOT CAUSE" or "ROOT CAUSE ASSESSMENT"
                       REGEXP_INSTR (
                           str,
                           'ROOT.{1}CAUSE:|ROOT.{1}CAUSE.{1}ASSESSMENT:')
                           root_data_start_pos,
                       --#1: look for either "ROOT CAUSE" or "ROOT CAUSE ASSESSMENT". get length of string
                       CASE
                           WHEN REGEXP_INSTR (str, 'ROOT.{1}CAUSE:') > 0
                           THEN
                               LENGTH ('ROOT CAUSE:')
                           WHEN REGEXP_INSTR (str,
                                              'ROOT.{1}CAUSE.{1}ASSESSMENT:') >
                                0
                           THEN
                               LENGTH ('ROOT CAUSE ASSESSMENT:')
                       END
                           root_data_str_len
                  FROM data d) data_location) parsed_data
Expected output

STR                                                                     ROOT_DATA_PARSED
1.(U) ROOT CAUSE: Root cause data1. other data	                        Root cause data1. other data
1.(U) ROOT*CAUSE ASSESSMENT: Root cause data2   3.(U) other data	Root cause data2
1.(U) ROOT-CAUSE: Root cause data3   2.(U) other data	                Root cause data3
1.(U) TEST STRING: Root cause data9 5.(U)	                       
Re: RegEX data parsing [message #686654 is a reply to message #686652] Fri, 11 November 2022 05:49 Go to previous messageGo to next message
Solomon Yakobson
Messages: 3269
Registered: January 2010
Location: Connecticut, USA
Senior Member
WITH data AS (
              SELECT 1 id, '1.(U) ROOT CAUSE: Root cause data1. other data' str FROM DUAL union all
              SELECT 2 id, '1.(U) ROOT*CAUSE ASSESSMENT: Root cause data2   3.(U) other data' str FROM DUAL UNION ALL
              SELECT 3 id, '1.(U) ROOT-CAUSE: Root cause data3   2.(U) other data' str FROM DUAL UNION ALL
              SELECT 9 id, '1.(U) TEST STRING: Root cause data9 5.(U)' str FROM DUAL
             )
SELECT  STR,
        REGEXP_SUBSTR(
                      STR,
                      'ROOT.CAUSE(.ASSESSMENT)?:(.+?)((\d+\.\()|$)',
                      1,
                      1,
                      NULL,
                      2
                     ) ROOT_DATA_PARSED
  FROM  DATA
/

STR                                                              ROOT_DATA_PARSED
---------------------------------------------------------------- -----------------------------
1.(U) ROOT CAUSE: Root cause data1. other data                    Root cause data1. other data
1.(U) ROOT*CAUSE ASSESSMENT: Root cause data2   3.(U) other data  Root cause data2
1.(U) ROOT-CAUSE: Root cause data3   2.(U) other data             Root cause data3
1.(U) TEST STRING: Root cause data9 5.(U)

SQL>
SY.

[Updated on: Fri, 11 November 2022 05:50]

Report message to a moderator

Re: RegEX data parsing [message #686657 is a reply to message #686654] Sat, 12 November 2022 10:20 Go to previous messageGo to next message
lott42_gmail
Messages: 146
Registered: June 2010
Senior Member
Thanks so much Solomon!! I'm putting your pattern in regex101.com to understand the regex pattern better but I think the "2" in REGEXP_SUBSTR is returning the data in the 2nd Capturing Group
Re: RegEX data parsing [message #686658 is a reply to message #686657] Sun, 13 November 2022 12:20 Go to previous messageGo to next message
lott42_gmail
Messages: 146
Registered: June 2010
Senior Member
Solomon, I introduced a Line Feed for ID:3 in the STR value

WITH data AS (
              SELECT 1 id, '1.(U) ROOT CAUSE: Root cause data1. other data' str FROM DUAL union all
              SELECT 2 id, '1.(U) ROOT*CAUSE ASSESSMENT: Root cause data2   3.(U) other data' str FROM DUAL UNION ALL
              SELECT 3 id, '1.(U) ROOT-CAUSE: Root 
              cause data3   2.(U) other data' str FROM DUAL UNION ALL
              SELECT 9 id, '1.(U) TEST STRING: Root cause data9 5.(U)' str FROM DUAL
             )
SELECT  STR,
--        DUMP(str) str_dump,
        REGEXP_SUBSTR(
                      STR,
--                      'ROOT.CAUSE(.ASSESSMENT)?:(.+?)((\d+\.\()|$)',
--                      'ROOT.CAUSE(.ASSESSMENT)?:(^\s*|\s*$)((\d+\.\()|$)',
                      'ROOT.CAUSE(.ASSESSMENT)?:((.|\n)*)((\d+\.\()|$)',
                      1,
                      1,
                      NULL,
                      2
                     ) ROOT_DATA_PARSED
  FROM  DATA
I'd still like to parse the data even if it contains a CR/LF character in there like
Root
 
cause data3
I tried a few variations of where I think the regex should be modified to allow for these line terminators but to no avail...
Re: RegEX data parsing [message #686660 is a reply to message #686658] Mon, 14 November 2022 07:47 Go to previous messageGo to next message
Solomon Yakobson
Messages: 3269
Registered: January 2010
Location: Connecticut, USA
Senior Member
You should start reading documentation:

WITH data AS (
              SELECT 1 id, '1.(U) ROOT CAUSE: Root cause data1. other data' str FROM DUAL union all
              SELECT 2 id, '1.(U) ROOT*CAUSE ASSESSMENT: Root cause data2   3.(U) other data' str FROM DUAL UNION ALL
              SELECT 3 id, '1.(U) ROOT-CAUSE: Root
              cause data3   2.(U) other data' str FROM DUAL UNION ALL
              SELECT 9 id, '1.(U) TEST STRING: Root cause data9 5.(U)' str FROM DUAL
             )
SELECT  ID,
        STR,
        REGEXP_SUBSTR(
                      STR,
                      'ROOT.CAUSE(.ASSESSMENT)?:(.+?)((\d+\.\()|$)',
                      1,
                      1,
                      'n',
                      2
                     ) ROOT_DATA_PARSED
  FROM  DATA
/
        ID STR                                                                 ROOT_DATA_PARSED
---------- ------------------------------------------------------------------- ------------------------------
         1 1.(U) ROOT CAUSE: Root cause data1. other data                       Root cause data1. other data
         2 1.(U) ROOT*CAUSE ASSESSMENT: Root cause data2   3.(U) other data     Root cause data2
         3 1.(U) ROOT-CAUSE: Root                                               Root
                         cause data3   2.(U) other data                                      cause data3

         9 1.(U) TEST STRING: Root cause data9 5.(U)

SQL>
SY.
Re: RegEX data parsing [message #686661 is a reply to message #686660] Mon, 14 November 2022 11:48 Go to previous message
lott42_gmail
Messages: 146
Registered: June 2010
Senior Member
Thanks again Solomon!! I still need a lot of practice with RegEx that I assumed that I needed to modify the expression. Never occurred to me that "regexp_modifier" option was an issue. Yes, I still have a lot of learning to do but I do appreciate your help plus reading other posts to help with my learning curve.
Previous Topic: Dynamic Procedure To Drop A Database Link
Next Topic: Row into Column and sum two columns
Goto Forum:
  


Current Time: Thu Mar 28 06:28:55 CDT 2024