Regular expressions and string matching

Hi everyone,
Here is my problem, I have a partially decrypted piece string which would appear something like.
Partially deycrpted:     the?anage??esideshe?e
Plain text:          themanagerresideshere
So you can see that there are a few letter missing from the decryped text. What I am trying to do it insert spaces into the string so that I get:
                    The ?anage? ?esides he?e
I have a method which splits up the string in substrings of varying lengths and then compares the substring with a word from a dictionary (implemented as an arraylist) and then inserts a space.
The problem is that my function does not find the words in the dictionary because my string is only partially decryped.
     Eg:     ?anage? is not stored in the dictionary, but the word �manager� is.
So my question is, is there a way to build a regular expression which would match the partially decrypted text with a word from a dictionary (ie - ?anage? is recognised and �manager� from the dictionary).

I wrote the following method in order to test the matching using . in my regular expression.
public void getWords(int y)
int x = 0;
for(y=y; y < buff.length(); y++){
String strToCompare = buff.substring(x,y); //where buff holds the partially decrypted text
x++;
Pattern p = Pattern.compile(strToCompare);
for(int z = 0; z < dict.size(); z++){
String str = (String) dict.get(z); //where dict hold all the words in the dictionary
Matcher m = p.matcher(str);
if(m.matches()){
System.out.println(str);
System.out.println(strToCompare);
// System.out.println(buff);
If I run the method where my parameter = 12, I am given the following output.
aestheticism
aestheti.is.
demographics
de.o.ra.....
Which suggests that the method is working correctly.
However, after running for a short time, the method cuts and gives me the error:
PatternSyntaxException:
Null(in java.util.regex.Pattern).
Does anyone know why this would occur?

Similar Messages

  • Regular Expressions and String

    How do I return a String array as follow using regular expression.
    String[] strArray = {"Now is the time", "you can optionally preview your post","message by using a number of special tokens."}
    from this string
    <separator>Now is the time</separator><separator>you can optionally preview your post</separator><separator>message by using a number of special tokens.</separator>
    Note: The string has the <separator> XML tag

    How do I return a String array as follow using regular
    expression.
    String[] strArray = {"Now is the time", "you can
    optionally preview your post","message by using a
    number of special tokens."}
    from this string
    <separator>Now is the time</separator><separator>you
    can optionally preview your
    post</separator><separator>message by using a number
    of special tokens.</separator>
    Note: The string has the <separator> XML tag
    This cannot be done using simple regular expressions - at least not if your number of <separator>s is random, which is what you seem to imply.
    Simple regular expressions are one-off, that means it can have a String array as a result, but only to the amount of brackets in the regex.
    a regex like:
    <separator>([^<]*)</separator><separator>([^<]*)</separator><separator>([^<]*)</separator>
    would return what you want, but I doubt that it is as flexible as you want it to be.

  • Regular expressions and limiting matched input

    Hi everyone :) I am trying to put together a regular expression that matches strings that contain elements of the form;
    {<some text>}
    However, each piece of text may contain multiple embedded instances of this pattern. I want to ensure that I am always getting the first (or outermost) instance.
    So, if I had;
    {OneStart}{TwoStart}{TwoEnd}{OneEnd}
    I want to make sure that I get 'One' first and 'Two' second. So I have to place a stipulation in my regular expression to match this pattern only if it has not located the patten previously.
    At the moment, I have this -
    ([^\\{]*?)(\\{TagStart\\})(.*?)(\\{TagEnd\\})(.*)
    What I think that I have to do is modify the first capture group '([^\\{]*?)' which at the moment only does not match if a preceding '{' is found to match only if a preceding '{<text>}' sequence is not found.
    Anyone got any idea how to do this?
    Thanks in advance.
    Ben

    Doesn't it work anyway, since you're using greedy operators? If not, won't it work if you remove the .* at the end and use find() rather than matches()? And finally, what's the (.*?) supposed to match? Looks to me like that should be .*

  • Regular expression and pattern matching/replacing

    I have a list of key words. It has around 1000 key word now but can grow to 5000 keywords.
    My web application displays lot of texts which are stored in the database. My requirement is to scan each text for the occurance of any of the above keywords. If any keyword is present I have to replace that with some custom values, before showing it to the user.
    I was thinking of using using regular expression for replacing the keyword in the text using matcher.replaceAll method as follows:
    Pattern pattern = Pattern.compile(patternStr);
    Matcher matcher = pattern.matcher(inputStr);
    String output = matcher.replaceAll(replacementStr);
    But My pattern string will have around 5000 keywords with the 'OR' Logical Operator like- keyword1| keyword2 I keyword3 | ..........
    Will such a big pattern string adversly affect the performance? What can I do to speed up the performance? (Since my keyword list is not static i would prefer to do the replacement just before showing the text to the user)
    Any suggestions are most welcome.

    I don't think a pure regex approach would be that slow, but it would be a maintenance nightmare. I think a combined regex/table-lookup approach would be best: use a regex to identify potential keywords, then look them up in the table to confirm. For instance, to find all Java keywords you could use the regex "\\b[a-z]{2,12}+\\b" to filter out anything that can't possibility be a keyword.
    What are you going to replace the keywords with? Will it vary depending on which keyword is found? If so, you'll have to use a table--and you won't be able to use the replaceAll method, because it can't handle dynamically generated replacement values. You would have to use the lower-level appendReplacement and appendTail method instead.

  • Regular Expressions and String variables

    Hi,
    I am attempting to implement a system for searching text files for regular expression matches (similar to something like TextPad, etc.).
    Looking at the regular expression API, it appears that you can only match using string variables. I just wanted to make sure this is true. Some of these files might be large and I feel uneasy about loading them into ginormous Strings. Is this the only way to do it? Can I make a String as big as I want?
    Thanks,
    -Mike

    Newlines are only a problem if you're reading the
    text line-by-line and applying the regexp to each
    line. It wouldn't catch expressions that span
    lines.
    @sabre150: your note re: CharSequence -- so what
    you're suggesting is to implement a CharSequence that
    wraps the file contents, and then use the regexps on
    the whole thing? I like the idea but it seems like
    it would only be easy to implement if the file uses a
    fixed-width character set. Or am I missing
    something...?You are correct for the most basic implementation. It is very easy to create a char sequence for fixed width character sets using RandomAccessFile. Once you go to character sets such as UTF-8 then more effort is required.
    While ever the regex is moving forward thought the CharSequence one char at a time there is no problem because one can wrap a Reader but once it backtracks then one needs random access and one will need to have a buffer. I have used a ring buffer for this which seems to work OK but of course this will not allow the regex to move to any point in the CharSequence.
    'uncle_alice' is the regex king round here so listen to him.
    :-( I should read further ahead next time!
    Message was edited by:
    sabre150
    Message was edited by:
    sabre150

  • An additional question about regular expressions with String.matches

    does the String.matches() method match expressions when some substring of the String matches, or does it have to match the entire String? So, if i have the String "123ABC", and i ask to match "1 or more letters" will it fail because there are non-letters in the String, but then pass if i add "1 or more letters AND 1 or more digits"? so, in the latter every character in the String is accounted for in the search, as opposed to the first. Is that correct, or are there ways to JUST match some substring in the String instead of the whole thing? i WILL make some examples too... but does that make sense?

    It has to match the whole String. Use Matcher.find() to match on just a sub-string()

  • Regular Expressions and string handling

    I'd like to be able to make a method that takes a String and removes certain substrings from that string. What I'm doing is that I'm getting some text from the internet (an HTML doc) and I'd like to remove all the tags from it.
    my method is:
         String DeTokenString(String s)
    String t=new String(s.replaceAll("<regex>",""));
              return t;
    What would I put in place of <regex> to remove all html tags? s has already had its leading and following whitespace removed elsewhere with .trim() before being passed to DeTokenString.
    Is there any other simple way to accomplish this?

    It does on whatever manages to get through my 17
    firewall, hand-woven packet destroyers, and
    titanium-lead armor.So! That is your final defense mechanism. Bwah hah hah hah hah! Now I have you. That was all I needed to improve my 17-firewall-sneaking-through, hand-woven-packet-destroyer-unweaving, titanium-lead-armor-piercing virus!
    � {�                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               

  • "Match Regular Expression" and "Match Pattern" vi's behave differently

    Hi,
    I have a simple string matching need and by experimenting found that the "Match Regular Expression" and "Match Pattern" vi's behave somewhat differently. I'd assume that the regular expression inputs on both would behave the same. A difference I've discovered is that the "|" character (the "vertical bar" character, commonly used as an "or" operator) is recognized as such in the Match Regular Expression vi, but not in the Match Pattern vi (where it is taken literally). Furthermore, I cannot find any documentation in Help (on-line or in LabVIEW) about the "|" character usage in regular expressions. Is this documented anywhere?
    For example, suppose I want to match any of the following 4 words: "The" or "quick" or "brown" or "fox". The regular expression "The|quick|brown|fox" (without the quotes) works for the Match Regular Expression vi but not the Match Pattern vi. Below is a picture of the block diagram and the front panel results:
    The Help says that the Match Regular Expression vi performs somewhat slower than the Match Pattern vi, so I started with the latter. But since it doesn't work for me, I'll use the former. But does anyone have any idea of the speed difference? I'd assume it is negligible in such a simple example.
    Thanks!
    Solved!
    Go to Solution.

    Yep-
    You hit a point that's frustrated me a time or two as well (and incidentally, caused some hair-pulling that I can ill afford)
    The hint is in the help file:
    for Match regular expression "The Match Regular Expression function gives you more options for matching
    strings but performs more slowly than the Match Pattern function....Use regular
    expressions in this function to refine searches....
    Characters to Find
    Regular Expression
    VOLTS
    VOLTS
    A plus sign or a minus sign
    [+-]
    A sequence of one or more digits
    [0-9]+
    Zero or more spaces
    \s* or * (that is, a space followed by an asterisk)
    One or more spaces, tabs, new lines, or carriage returns
    [\t \r \n \s]+
    One or more characters other than digits
    [^0-9]+
    The word Level only if it
    appears at the beginning of the string
    ^Level
    The word Volts only if it
    appears at the end of the string
    Volts$
    The longest string within parentheses
    The first string within parentheses but not containing any
    parentheses within it
    \([^()]*\)
    A left bracket
    A right bracket
    cat, cag, cot, cog, dat, dag, dot, and dag
    [cd][ao][tg]
    cat or dog
    cat|dog
    dog, cat
    dog, cat cat dog,cat
    cat cat dog, and so on
    ((cat )*dog)
    One or more of the letter a
    followed by a space and the same number of the letter a, that is, a a, aa aa, aaa aaa, and so
    on
    (a+) \1
    For Match Pattern "This function is similar to the Search and Replace
    Pattern VI. The Match Pattern function gives you fewer options for matching
    strings but performs more quickly than the Match Regular Expression
    function. For example, the Match Pattern function does not support the
    parenthesis or vertical bar (|) characters.
    Characters to Find
    Regular Expression
    VOLTS
    VOLTS
    All uppercase and lowercase versions of volts, that is, VOLTS, Volts, volts, and so on
    [Vv][Oo][Ll][Tt][Ss]
    A space, a plus sign, or a minus sign
    [+-]
    A sequence of one or more digits
    [0-9]+
    Zero or more spaces
    \s* or * (that is, a space followed by an asterisk)
    One or more spaces, tabs, new lines, or carriage returns
    [\t \r \n \s]+
    One or more characters other than digits
    [~0-9]+
    The word Level only if it begins
    at the offset position in the string
    ^Level
    The word Volts only if it
    appears at the end of the string
    Volts$
    The longest string within parentheses
    The longest string within parentheses but not containing any
    parentheses within it
    ([~()]*)
    A left bracket
    A right bracket
    cat, dog, cot, dot, cog, and so on.
    [cd][ao][tg]
    Frustrating- but still managable.
    Jeff

  • Regular expression and XML

    Hello,
    I have an XML file containing regular expressions and i parse the file, extract the pattern from it and search for it using java regex package. The problem is it works fine when patterns are words but when the pattern is something like
    write \\d+ (write followed by a space followed by one or mre digits) it doesn't work.
    I wrote the same code but with the pattern embedded in it,ie. without using XML and it worked. But when extracting with XML it fails.
    Also if the pattern is write[0-9] it only extracts write[0-9 and gives an error of no closing bracket.
    Could anyone please tell me what i am missing out
    Thank you

    thank you for your replies. Well i have still no got over the problem so i am posting my code here and hoping it can get solved
    import org.xml.sax.*;
    import org.xml.sax.helpers.*;
    import java.io.*;
    import java.util.regex.*;
    class textextractor extends DefaultHandler{
         boolean regex=false;
    public void startElement(String namespaceURI,String localName,String qn,Attributes attr)
              if(localName.equals("REGEX"))
               regex=true;
    public void characters(char [] text,int start,int length)throws SAXException {
              String t=new String(text,start,length);
              boolean flag=false;
              if(regex==true)
                Pattern pattern;
                  String w=new String(t);
              pattern = Pattern.compile(w);
              Matcher matcher;
              matcher=pattern.matcher("there is a bat   read  write 13    error at line ");
              while(matcher.find())
               flag=true;
               System.out.println("I found the text \"" + matcher.group() +"\" starting at index "
               + matcher.start() +"and ending at index " + matcher.end() + ".");
             if(!flag)
               System.out.println("not found");
             regex=false;
    public class saxt2 {
         public static void main(String args[]) {
              try {
                    XMLReader parser= XMLReaderFactory.createXMLReader();
                    ContentHandler handler=new textextractor();
                    parser.setContentHandler(handler);
                                    parser.parse("d:\\regex.xml");
                  }catch (Exception e) {
                   System.err.println(e);
    }The xml file is
                      <RegularExpression>
                      <REGEX>write</REGEX>
                      <REGEX>write \\d+</REGEX>
                      <REGEX>read[0-9]</REGEX>
                      </RegularExpression>by running the code you can see that write is found,write \\d+ doesn't match write 13 in the string and read[0-9] gives and error.
    Any help will be greatly appreciated

  • Regular expressions and sql

    I have working regular expressions and a working sql connection, but I don�t know how to stop the info from getting into the database when input doesent match the regular expression.
    For instans, you put in an e-mail without an "@" and my program writes and error message. But the info still gets in to the database.
    Any help would be much apreciated as I dont know where to start. If you have links or code examples that would be great to.
    Thanx.

    Well, the obvious answer is "only write the data to the database if the input doesn't match the regular expression."
    Presumably you're really asking how to do that - but it depends upon how your application is structured in the first place, and you haven't told us anything at all about that.

  • Wat should be the regular expression for string MT940_UB_*.txt to be used in SFTP sender channel in PI 7.31 ??

    Hi All,
    What should be the regular expression for string MT940_UB_*.txt and MT940_MB_*.txt to be used as filename inSFTP sender channel in PI 7.31 ??
    If any one has any idea on this please let me know.
    Thanks
    Neha

    Hi All,
    None of the file names suggested is working.
    I have tried using - MT940_MB_*\.txt , MT940_MB_*.*txt , MT940*.txt
    None of them is able to pick this filename - MT940_MB_20142204060823_1.txt
    Currently I am using generic regular expression which picks all .txt files. - ([^\s]+(\.(txt))$)
    Let me know ur suggestion on this.
    Thanks
    Neha Verma

  • Regular Expression and PL/SQL help

    I am using Oracle 9i, does 9i support regular expression? What functions are there?
    My problem is the birth_date column in my database comes from teleform ( a scan program that reads what people wrote on paper), so the format is all jacked up.... 50% of them are 01/01/1981, 10% are 5/14/1995, 10% are 12/5/1993, 10% are 1/1/1983, 10% are 24-JUL-98. I have never really used regular expression and pl/sql, can anybody help me convert all of them to 01/01/1998?
    Does Oralce 9i support regular expression? What can I do if oralce 9i does not support regular expression? Thank you very much in advance.

    9i doesn't support regular expressions (at least not in the 10g regular expressions sense. There is an OWA_PATTERN_MATCH package that has some facilities for regular expressions). But it doesn't look like this is a regular expressions problem.
    Instead, this is probably a case where you need to
    - enumerate the format masks you want to try
    - determine the order you want to try them
    - write a small function that tries each format mask in succession until one matches.
    Of course, there is no guarantee that you'll ever be able to convert the data to the date that the user intended because some values will be ambiguous. For example, 01/02/03 could mean Feb 1, 2003 or Jan 2, 2003 or Feb 3, 2001 depending on the person who entered the data.
    Assuming you can define the order, your function would just try each format mask in turn until one generated a valid date, i.e.
    BEGIN
      BEGIN
        l_date := TO_DATE( p_string_value, format_mask_1 );
        RETURN l_date;
      EXCEPTION
        WHEN OTHERS THEN
          NULL;
      END;
      BEGIN
        l_date := TO_DATE( p_string_value, format_mask_2 );
        RETURN l_date;
      EXCEPTION
        WHEN OTHERS THEN
          NULL;
      END;
      BEGIN
        l_date := TO_DATE( p_string_value, format_mask_3 );
        RETURN l_date;
      EXCEPTION
        WHEN OTHERS THEN
          NULL;
      END;
      BEGIN
        l_date := TO_DATE( p_string_value, format_mask_N );
        RETURN l_date;
      EXCEPTION
        WHEN OTHERS THEN
          NULL;
      END;
      RETURN NULL;
    END;Justin

  • Regular expressions and backreference

    Hello!
    I am trying to use backreferences in REGEXP in the PERL-style, where I want to match my regular expression and later refer to the grouped values. I can read that those are referecenced with \1 .. \9, but I simply cant get it to work. Here is an example in PL/SQL:
    SELECT REGEXP_SUBSTR(l_users.adresse,'([A-Z]+)\s+(\d+)')
    INTO l_dummy_varchar2
    FROM dual;
    OR I could do things like:
    l_dummy_varchar2 := REGEXP_SUBSTR(l_users.adresse,'([A-Z]+)\s+(\d+)');
    It seems to work, but I cant figure out how to get the backreferenced value.
    I would love to do things like:
    dbms_output.put_line('my value ='||\1)
    but this doesnt work.
    Help is very much appreciated.
    Best regards
    Dannie

    Likewise you can extract things using the
    REGEXP_SUBSTR, but you don't need back
    referencing...backreferencing is better than additional function (ltrim) use, and BTW be careful with this "ltrims":
    SQL> set serveroutput on
    SQL>
    SQL> DECLARE
      2       v_txt VARCHAR2(100);
      3     BEGIN
      4       v_txt := ltrim(regexp_substr('HERE IS AN ASCII CHARACTER', 'IS AN [[:alnum:]]*'),'IS AN ');
      5       DBMS_OUTPUT.PUT_LINE('Word after IS AN: '||v_txt);
      6  END;
      7  /
    Word after IS AN: CII
    PL/SQL procedure successfully completed
    SQL>
    SQL> DECLARE
      2       v_txt VARCHAR2(100);
      3     BEGIN
      4       v_txt := regexp_replace('HERE IS AN ASCII CHARACTER', 'IS AN ([[:alnum:]]*)|.','\1');
      5       DBMS_OUTPUT.PUT_LINE('Word after IS AN: '||v_txt);
      6  END;
      7  /
    Word after IS AN: ASCII
    PL/SQL procedure successfully completed
    SQL> -----------
    VB
    http://volder-notes.blogspot.com/

  • Juniper MX Regular expressions and user permissions ACS 5.4

    Hi everyone!
    Im having some trouble with regular expressions and permissions on our Juniper MX routers through ACS 5.4, and i would like some insight/help/poitners!!
    We have a team of engineers that should only have read only permissions (important: show configuration) and also be able to just change the description on interfaces.
    Thus far with the following regular expressions set for the shell profile they are going through i have managed the above, however the problem is when an engineer inputs "Show configuration", only the interfaces descriptions configuration is shown! The rest of the configuration will not be printed.
    deny-commands1=.*.
    allow-commands1=configure
    deny-configuration1=.*.
    allow-commands2=interfaces .*. description .*$
    allow-configuration1=interfaces .*. description .*$
    allow-commands2=show configuration.*
    allow-commands3=show configuration
    (some of these regex i know that are not needed, i was just playing around to check everything before posting)
    Any pointers as to why or how to resolve this?
    example output with the above:
    show configuration
    ## Last commit: 2014-01-09 09:34:44 EET by someone
    interfaces {
        xe-0/0/0 {
        xe-0/0/1 {
            description xxxx;
        xe-0/1/0 {
            description xxxx;
        xe-0/1/1 {
            description xxxx;
        xe-0/2/0 {
            disable;
        xe-0/2/1 {
            description xxxx;
        xe-0/3/0 {
            description xxxx;
        xe-0/3/1 {
            description xxxx;
        ae0 {
            description "xxxx";
        ae1 {
            description xxxx;
        demux0 {
        lo0 {
    {master}
    Thanks in advance!
    Spyros

    You are absolutely right!!  I was doing research online after posting the above.  The correct RADIUS attribute to use is actually CVPN3000/ASA/PIX7.x-Group-Based-Address-Pools.  Then create the pool in ASA, and call that pool's name in ACS under that RADIUS attribute.  Someone explained this perfectly in this community before.  Much appreciate your answer!
    Here's from another post last year:
    ACS  5 does not have the feature of IP pools. Logically its always good to  setup pools locally on vpn server and if you want user to pick ip from  specific local pool you can configure acs to push that attribute.
    On ACS Go to > Policy Elements  -> Network Access ->   Authorization Profiles -> Create ->
    Name of the Policy ->Dictionary Type: Radius-Cisco VPN 3000/ASA/PIX7.x
    Attribute Type : CVPN3000/ASA/PIX7.x-Group-Based-Address-Pools
    Attribute Type: String
    Attribute Value : Static MYPOOL (Name of the Pool which is defined on the ASA)
    Access Policies ->Default Network Access -> Authorization ->  Create -> Under result section call the Authorization p

  • Can somebody help me in getting some good material for Regular Expressions and IP Community list

    can somebody help me in getting some good material for Regular Expressions and IP Community list

    I'm not sure what you mean by "IP Community list", but here are 3 reference sites for Regular Expressions:
    Regular Expression Tutorial - Learn How to Use Regular Expressions
    http://www.regular-expressions.info/tutorial.html
    Regular Expressions Cheat Sheet by DaveChild
    http://www.cheatography.com/davechild/cheat-sheets/regular-expressions/
    Regular Expressions Quick Reference
    http://www.autohotkey.com/docs/misc/RegEx-QuickRef.htm

Maybe you are looking for

  • Validation on Primary Key value in TMG

    Hi, I have a  custom table with field Outcome_id as primary key. I have to put some validation on this primary key value. For the above I hv used a TMG Event '01' and written the code for validating the values. On save when validation fails the prima

  • Need help in Performance tuning

    Hi All,         I am facing some performance issues in my program.  The program taking a hell lot of time to execute and some times timing out without giving the out put.  This is a report program with ALV output.  It is handling mainly Sales related

  • How to backout my Function module?

    Hi All, if I have transported my inbound function module to production system, but when my code has influence on data in the production system, I want to backout my inbound function module and I don't what to delete my code, what should I do? Many th

  • Smooth Scrolling : folio size VS article size

    Hi I'd like to create a smooth scrolling article just like in DPS TIPS "Effects" by Bob Bringhurst. I follow the steps from the article but it does not work. What I want to to is to have a 1024*768 (horizontal only) folio that includes articles that

  • I try to restore my iphone but i got a message says unknown error

    i try to restore my iphone but i got a message says unknown error