Help: Verify or Suggest a Simple Regular Expression

I'm trying to do a mapping from a title to a file name portion of a URL. Thus the result needs to follow the rules as specified here:
http://labs.apache.org/webarch/uri/rfc/rfc3986.html#unreserved
I identified the following characters as legal: a-z / 0-9 / "-" / "." / "_" / "~"
Everything else has to be converted to an underscore.
I came up with the following expression:
someString.replaceAll("[^a-zA-Z0-9-._~]", "_")Is that correct? I spent a lot of time trying to figure out regular expressions, but it seems like everyone (i.e. PHP, TextPad and now Java) has a slightly different version and to top it off, there not very good tutorials or explanations. I dread regular expressions!!!
Can anyone help please?

HoganWang wrote:
Ur regular expression is right. The regular expression has simple and complex versions. If the replaceAll is frequently called, it is recommended to use Pattern to compile the regular expressions first.How does the expression know that the hypen isn't part of range? I guess the only way is that it is between alphabetical letters or numbers.
In terms of efficiency. This is called once per page request i.e.
somedomain.com/somecategory/title_title_title
Well, I need to be able to translate title&title$title to title_title_title. It doesn't seem like a pre-compiling the regular expression will speed it up since between page requests, it won't remember fields or am I wrong?

Similar Messages

  • Request some help, over procedure's performance uses regular expressions for its functinality

    Hi All,
            Below is the procedure, having functionalities of populating two tables. For first table, its a simple insertion process but for second table, we need to break the soruce record as per business requirement and then insert into the table. [Have used regular expressions for that]
            Procedure works fine but it takes around 23 mins for processing 1mm of rows.
            Since this procedure would be used, parallely by different ETL processes, so append hint is not recommended.
            Is there any ways to improve its performance, or any suggestion if my approach is not optimized?  Thanks for all help in advance.
    CREATE OR REPLACE PROCEDURE SONARDBO.PRC_PROCESS_EXCEPTIONS_LOGS_TT
         P_PROCESS_ID       IN        NUMBER, 
         P_FEED_ID          IN        NUMBER,
         P_TABLE_NAME       IN        VARCHAR2,
         P_FEED_RECORD      IN        VARCHAR2,
         P_EXCEPTION_RECORD IN        VARCHAR2
        IS
        PRAGMA AUTONOMOUS_TRANSACTION;
        V_EXCEPTION_LOG_ID     EXCEPTION_LOG.EXCEPTION_LOG_ID%TYPE;
        BEGIN
        V_EXCEPTION_LOG_ID :=EXCEPTION_LOG_SEQ.NEXTVAL;
             INSERT INTO SONARDBO.EXCEPTION_LOG
                 EXCEPTION_LOG_ID, PROCESS_DATE, PROCESS_ID,EXCEPTION_CODE,FEED_ID,SP_NAME
                ,ATTRIBUTE_NAME,TABLE_NAME,EXCEPTION_RECORD
                ,DATA_STRUCTURE
                ,CREATED_BY,CREATED_TS
             VALUES           
             (   V_EXCEPTION_LOG_ID
                ,TRUNC(SYSDATE)
                ,P_PROCESS_ID
                ,'N/A'
                ,P_FEED_ID
                ,NULL 
                ,NULL
                ,P_TABLE_NAME
                ,P_FEED_RECORD
                ,NULL
                ,USER
                ,SYSDATE  
            INSERT INTO EXCEPTION_ATTR_LOG
                EXCEPTION_ATTR_ID,EXCEPTION_LOG_ID,EXCEPTION_CODE,ATTRIBUTE_NAME,SP_NAME,TABLE_NAME,CREATED_BY,CREATED_TS,ATTRIBUTE_VALUE
            SELECT
                EXCEPTION_ATTR_LOG_SEQ.NEXTVAL          EXCEPTION_ATTR_ID
                ,V_EXCEPTION_LOG_ID                     EXCEPTION_LOG_ID
                ,REGEXP_SUBSTR(str,'[^|]*',1,1)         EXCEPTION_CODE
                ,REGEXP_SUBSTR(str,'[^|]+',1,2)         ATTRIBUTE_NAME
                ,'N/A'                                  SP_NAME    
                ,p_table_name
                ,USER
                ,SYSDATE
                ,REGEXP_SUBSTR(str,'[^|]+',1,3)         ATTRIBUTE_VALUE
            FROM
            SELECT
                 REGEXP_SUBSTR(P_EXCEPTION_RECORD, '([^^])+', 1,t2.COLUMN_VALUE) str
            FROM
                DUAL t1 CROSS JOIN
                        TABLE
                            CAST
                                MULTISET
                                    SELECT LEVEL
                                    FROM DUAL
                                    CONNECT BY LEVEL <= REGEXP_COUNT(P_EXCEPTION_RECORD, '([^^])+')
                                AS SYS.odciNumberList
                        ) t2
            WHERE REGEXP_SUBSTR(str,'[^|]*',1,1) IS NOT NULL
            COMMIT;
           EXCEPTION
             WHEN OTHERS THEN
             ROLLBACK;
             RAISE;
        END;
    Many Thanks,
    Arpit

    Regex's are known to be CPU intensive specially when dealing with large number of rows.
    If you have to reduce the processing time, you need to tune the Select statements.
    One suggested change could be to change the following query
    SELECT
                 REGEXP_SUBSTR(P_EXCEPTION_RECORD, '([^^])+', 1,t2.COLUMN_VALUE) str
            FROM
                DUAL t1 CROSS JOIN
                        TABLE
                            CAST
                                MULTISET
                                    SELECT LEVEL
                                    FROM DUAL
                                    CONNECT BY LEVEL <= REGEXP_COUNT(P_EXCEPTION_RECORD, '([^^])+')
                                AS SYS.odciNumberList
                        ) t2
    to
    SELECT REGEXP_SUBSTR(P_EXCEPTION_RECORD, '([^^])+', 1,level) str
    FROM DUAL
    CONNECT BY LEVEL <= REGEXP_COUNT(P_EXCEPTION_RECORD, '([^^])+')
    Before looking for any performance benefit, you need to ensure that this does not change your output.
    How many substrings are you expecting in the P_EXCEPTION_RECORD? If less than 5, it will be better to opt for SUBSTR and INSTR combination as it might work well with the number of records you are working with. Only trouble is, you will have to write different SUBSTR and INSTR statements for each column to be fetched.
    How are you calling this procedure? Is it not possible to work with Collections? Delimited strings are not a very good option as it requires splitting of the data every time you need to refer to.

  • Help!!!!! Regular Expressions!!

    I am trying to use Regular Expressions, for parsing. For that the pakage required is
    java.util.regex.*;
    I am also using the import statement in a sample code. But compiling it, gives an error,
    ERRORS:
    Replacement.java:6: package java.util.regex does not exist
    import java.util.regex.*;
    ^
    I have also set the path to C:\jdk1.4\bin
    I have also set the classpath to C:\jdk1.4\lib
    I don't know, Why it doesn't recognise the java.util.regex package
    please help!!
    gaurav_k1

    Have you checked if the regex package is part of the
    JDK1.4? I can't find it. What classes does it
    implement?Yeah, since 1.4
    http://java.sun.com/j2se/1.4/docs/api/java/util/regex/package-summary.html
    I'm not sure what the original problem could be, possibly using a previously installed jre? If you had one previously installed, check the classpaths and uninstall any old jre (some forget that thinking they only need to remove the jdk). Could you give us anymore hints?

  • Simple regular expression problem

    Hello,
    I need help with regular expressions. I have a situation when I need to get data from one table to another and I think my problem can be solved using REG EXP, but I don't know how to use them properly.
    I need to seperate varchar2 fileld whcih is basically number/number into 2 seperate number fields
    CREATE TABLE tst (CODE VARCHAR2(10));
    INSERT INTO tst VALUES('10/15');
    INSERT INTO tst VALUES('13/12');
    INSERT INTO tst VALUES('30');
    INSERT INTO tst VALUES('15');
    CREATE TABLE tst2 (po NUMBER, co NUMBER); I need to get code into co and po columns. I think result should look something like this, but:
    INSERT INTO tst2
    SELECT regexp_substr(CODE 'something here to get the number before /') AS po,
           regexpr_substr(CODE 'something here to get number after') AS co
    FROM tst;   Any help appreciated

    Hi Blu,
    Yes, I have tested with "0" in the figure (like 10/20 30/40). And it worked that time and then I replied. :) :)
    But Still it has a problem in pattern and rectified it below.
    Like :-
    SQL> select regexp_substr('10/40','[^/][0 9]',1,2) DD from dual;
    DD
    40
    But if I (The way you test) use a non zero value like 43 ; below query will not return 43.
    SQL> select regexp_substr('15/43','[^/][0 9]',1,2) DD from dual;
    DD
    My pattern has a slight mistake("-" missing between 0 and 9) and I changed and retested . Correct pattern - '[^/][0-9]' and now it will return 43..
    SQL> select regexp_substr('15/43','[^/][0-9]',1,2) dd from dual ;
    DD
    43this '[^/]+' pattern also works fine.
    Thank you for pointing out Blu; as I came to know lot more about patterns.
    Regards,
    Ashutosh

  • Simple Regular Expression Question

    Or ist it not so simple? Decide yourself:
    If a random text (lets say "Nader") does not contain for example the string "Bush", then it shall match the pattern. But how would that pattern look like?
    First I tried ^(Bush). But it doesn't work. Then I tried lots of other things and googling, with no success.
    Any ideas for that simple thing?

    Let me get this straight.
    Text area A will have some text in it. Let's say the user supplies "Bush".
    Right?
    Text area B is allowed to contain any text in it except an exact match of what is in text area A.
    Right?
    if (B.getText().equals(A.getText())) { // or however you get the string from a TextArea
      // Uh-oh! They match!
    }But it can't be that simple, else you woldn't have posted.
    So what are you looking for?
    Can A contain stuff that is to be interpreted as a regex? That is If A has "B.*sh" then B must not start with "B" and end with "sh"?
    Will either or both of them be line-based? That is:
    A:
    Bush
    Cheny
    B:
    Clinton
    Bush
    and no line from A must match any line from B?
    Maybe I'm just dense, but I still don't really understand what you're looking for, or why you need regex since you seem to be just looking for an exact string match. Initially you said "contain", which suggests " .* Bush .* " but later you seem to be talking about exact matches.

  • [CS 5.5/JS/OSX] Simple regular expression crashes InDesign

    I'm in the process of migrating a script from CS3 to CS5.5. When trying the script with no modifications it causes InDesign to stall, and I have to force close it. After running through the script line by line I found the culprit:
    /A(_|-)?B\.jpg/.test("A-C.jpg")
    This causes the entire script engine to freeze.
    With CS3 it works fine. Also when do I slight change it works fine:
    /A(-|_)?B\.jpg/.test("A-C.jpg")
    Any idea what's going on? Is it a bug in the scripting engine?

    Any ideas exactly what kind of expressions triggers the bug?
    I rewrote my expression to not contain phrases like /(a|b|abc)?/. Instead I use /[ab]?(abc)?/, which is close enough, and seems to not trigger the bug.

  • Simple regular expression in oracle query

    hi guys, I have this challenge.
    say I have a query:
    select name, user_name, object_type from questions;
    now, for the column object type, I can get values that end in 'Q' followed by number.
    So object type columns can be 00Q1, ABCQ2, 56Q7 e.t.c. It can be any number really.
    The thing is, I want to add a small grouping, so that for the rows which have the object type column ending in Q followed by number, I can have an additional column whose value changes to question.
    So the query now becomes:
    select name, user_name, object_type, column_type from questions;
    So column_type can be question if object type ends with Q and a number, otherwise just give it a default value, like Others or something.
    Is this possible and if so how can I please achieve it.
    Thanks very much.

    Hope this will help.
    SQL> with t as
      2  ( select '00Q1' element_name from dual union all
      3    select 'ABCQ2' from dual union all
      4    select '56QA7 ' from dual union all
      5    select '56Q7 ' from dual union all
      6    select 'ABCQA' from dual)
      7  select * from t
      8  where regexp_like(element_name,'\Q[0-9]')
      9  /
    ELEMEN
    00Q1
    ABCQ2
    56Q7Or something like this
    SQL> with t as
      2  ( select '00Q1' element_name from dual union all
      3    select 'ABCQ2' from dual union all
      4    select '56QA7 ' from dual union all
      5    select '56Q7 ' from dual union all
      6    select 'ABCQA' from dual)
      7  select t.*, DECODE(regexp_instr(element_name,'\Q[0-9]'),0,'Not Found',element_name ) comments
      8  from t
      9  /
    ELEMEN COMMENTS
    00Q1   00Q1
    ABCQ2  ABCQ2
    56QA7  Not Found
    56Q7   56Q7
    ABCQA  Not FoundEdited by: Saubhik on May 18, 2010 6:41 AM

  • A simple regular expression....

    What I need to do is find occurances of |A in a string, and replace it with another string. This sounds like a perfect use of the String.replaceAll() method! So, I gave it a try, and failed. The code String.replaceAll("|A", "ReplacedText"); doesnt actually replace |A. I was wondering if I need an escape character before the pipe or something since pipe I believe is used in regex logic. Thanks for any help in advance.

    You're right, you need to do:newString = oldString.replaceAll("\\|A", replacementText);...using the double backslash to ensure that a single backslash gets into the actual regex.

  • Help me!   Give me a  Regular Expressions for Email and URL!

    see title!
    Thank you very much!

    URL pattern:
    String proto="(\\w+)";
    String userPassAt=":(\\w+)(?::(\\w+))?@";
    String host="([^\\s:/\\<>]+)";
    String port=":(\\d+)\\b";
    String file="/([^\\s:?]*)"; //backslash not included
    String query="\\?(\\S*)"; //'?' not included
    String urlPattern=proto + ":" + possibly("//") + possibly(userPassAt) + host + possibly(port) + possibly(file + possibly(query));
    static String possibly(String s){
    return "(?:"+s+")?";
    Group numbers:
    protocol = 1
    user = 2
    pass = 3
    host = 4
    port = 5
    file = 6
    query = 7
    Regards

  • Simple Regular Expression

    I am not sure what is wrong with the code. As seen in the code below, I am checking the string to be a alpha numeric. So "#$%#%" checked for the pattern "[^a-zA-Z0-9]" should return true. I am not sure what is wrong.
    For the below code "Success" is returned.
    public class TestRE {
         public static void main(String[] args) {
              String str = "#$%#%";          
              if (str.matches("[^a-zA-Z0-9]"))
                   System.out.println("Failed");
              else
                   System.out.println("Success");
    }

    This regex would work only when all the characters are non alpha numeric. To check for any occurence of special character, we still need prefix .*? and suffix .* as
    if (str.matches(".*?[^a-zA-Z0-9]+.*"))or
    if (str.matches(".*?\\P{Alnum}+.*"))My understanding is that the regex is considering the whole string for the match. If there is any MISMATCH, then the condition fails.

  • Simple regular expression/perl/sed type question

    If I have a string like this:
    1. e4 c5 2. Nf3 Nc6 3. Bc4 e6 4. c3 Nge7 5. d4 d5
    and I want to put
    after ever third "word" (i.e., before 2., 3., 4., 5. ...)
    how do I do it?

    Find:
    s(d+.s)
    Replace:
    $1

  • Litte help with regular expression?

    Greetings all,
    I have a simple regular expression "(\\w+)\\s(\\w+)\\s(.+)"
    Which I want to match against the strings like "Acetobacter pasteurianus LMD22.1"
    But this always fails whenever there is a dot (.) character like "LMD22.1" in above string.
    How to solve this ?
    Thanks in advance.

    Shouldn't that be Acinetobacter?
    edit: nope, I'm wrong, you're right.
    Edited by: Encephalopathic on Apr 7, 2009 7:34 PM

  • URL paths and regular expressions in ASDM

    Some background info - I've recently switched to an ASA 5510 on 8.4(3) coming from a Checkpoint NGX platform (let's say fairly quickly and without much warning ). I have a couple questions and they're kind of similar so I'll post them up. I've read docs about regex and creating them both via command line and ASDM, but the examples always seem to include info I don't need or honestly something I don't understand yet (mainly related to defining class\inspect maps). If someone could provide a simple example of how to do these in ASDM that would help a lot in understanding how regular expressions are properly configured. So here we go.
    I know this is basic but I need to make sure I understand this properly - I have a single web server (so this won't be a global policy) where I need to allow access to a specific URL path\file and that's it. So we'll call it \test\testfile.doc. Any other access to any other path should be dropped. What's the best way to do this in ASDM (6.4)? I think if I saw a basic example for this I could figure out next few questions but I'll post them as well just in case.
    I have another single public web server (again this won't be a global policy) where I'd like to specify blocking file types, like .php, .exe., etc... again a basic example would be great.
    Lastly, and this is kind of related, but we have a single office/domain and sometimes we get spam from forged addresses appearing to be from our domain. On Checkpoint I used to use its built-in SMTP security server and could define if it received mail from *@mydomain.com to drop it because we would never receive mail externally from our own domain name. I saw something similar with ESMTP in ASDM and it looks kind of like how you set up the URL access mentioned above. Can I configure this in ASDM as well, and if so how?
    TIA for your help,
    Jordan

    /bump

  • Regular expression in B2B Document editor

    Hi All,
    i intent to parse a file coming into B2B with record entry starting as ' 88'. And i want to use this value 88 to differentiate it from other records. I have set a rule (^[ \t]+88) on the field to pick this entry and set the value of the tag field as 88.
    but it is not getting picked up properly and is jumping over this record itself completely. could you please suggest how this regular expression rules work when used with tag.. i have to use only the rule but still the record is not getting properly picked up...
    kindly share any assistance in this regards
    thanks
    Rakesh

    Hi Datla,
    Yes, there is a de-identification support in Data Editor of B2B Document Editor. Once you open a EDI or HL7 doc with data editor, it will ask you to "Choose De-Identification and specify rule file". You may create a separate file for your use. Data Replacement Rule file is actually a XML which holds the separator information along with the data to be replaced. You may define your own DRR file.
    To know more, just open the Data Editor from Document Editor, go to Help --> Content -->Data Replace and De-Identify section.
    Regards,
    Anuj

  • Regular express excludes an integer

    Does some one know if there is a simple regular expression pattern which can be used in an XML schema as a restriction to exclude a few integers from the entire integer set?
    For example, if I want to use the schema to validate an xml document which has an element called 'playerId' and its value can be any integers BUT 1000, the schema segment for the validation could be like:
    <xsd:restriction base="xsd:integer">
    <xsd:pattern value="<<pattern string>>"/>
    </xsd:restriction>
    what <<pattern string>> can I use to validate the value IS NOT 1000?
    I tried a few such as ((\d*)-(1000)), [^(1000)], \d*[^(1000)], none worked. Any help will be greatly appreciated.

    Why don't you derive from an integer type instead of a string? I know this seems ridiculously verbose for such a simple restriction, but it should do what you want:<xsd:attribute name="root">
      <xsd:simpleType>
        <xsd:union>
          <xsd:simpleType>
            <xsd:restriction base="xsd:nonNegativeInteger">
              <xsd:maxInclusive value="999"/>
            </xsd:restriction>
          </xsd:simpleType>
          <xsd:simpleType>
            <xsd:restriction base="xsd:positiveInteger">
              <xsd:minInclusive value="1001"/>
            </xsd:restriction>
          </xsd:simpleType>
        </xsd:union>
      </xsd:simpleType>
    </xsd:attribute>

Maybe you are looking for

  • Setting up Airport express with new Vista Computer?

    I just bought a new computer with microsoft Vista. I have a netgear wireless router using WPA protection. I want to hook up an airport express as a client in another room where my stereo is. I had this working on my old XP computer but don't remember

  • Getting the values in jsp in struts

    hi, the thing is the values i had set in the form in the action as available in the jsp page when i use neastes tld but i am not able to find when i try to get in the request.getProperty(" ") nither as request,getAttribute() then how to get if i want

  • Submitting Forms after 22 June?

    Will those who have a form central PDF form be able to submit responses after the 22 June? when there is only restricted access. Thanks, C

  • Functions in SQL

    Will calling functions in SQL reduce the performance of a query ?

  • Data Filtering Problem

    Hi all, i have a report which has to display country wise sales based on the prompt(having drop down values: India, Australia, SouthAfrica). Here i have set filter for the country (Is prompted) in the report. On the selection of the prompt value, the