JDOM attribute converts tab characters to a space separator

I am using a well-formed xml file to parse delimited data. I am passing a tab character as an attribute to an element called delimiter. When the text values from my elements are read in, the tab character is properly read. The tab character that is stored in the delimiter attribute is read in as a space separator rather than a tab. Is this a bug or should I be parsing the attribute differntly? Below is my sample code and xml stream:
public static void main(String args[]) {
try{
File file = new File("c:\\output.xml");
Element rootElement = getRootElement(new FileReader(file));
System.out.println("Element text is tab: " + rootElement.getText().equals("\t"));
System.out.println("Attribute text is tab: " + rootElement.getAttributeValue("delimiter").equals("\t"));
System.out.println("Attribute type: " + Character.getType(rootElement.getAttributeValue("delimiter").charAt(0)));
//returns type 12 (space separator)
}catch (Exception e) {}
public static Element getRootElement(Reader reader) throws JDOMException, FileNotFoundException{
Element rootElement = null;
SAXBuilder builder = new SAXBuilder();
try {
// Generate JDom Document
org.jdom.Document jdomDoc = builder.build(reader);
rootElement = jdomDoc.getRootElement();
}catch (JDOMException e) { throw new JDOMException(e.getMessage()); } // indicates a well-formedness or validity error
return rootElement;
Below is the xml data that I used to emulate this test:
<DELIM_TEST delimiter="     ">     </DELIM_TEST>
NOTE: replace the delimiter and text values with a tab character if the html post converts them to a space.

I don't think it's a bug, I think it's a feature. Have a look at the XML specs, which are here:
http://www.w3.org/TR/REC-xml#sec-white-space
and see if section 2.10, White Space Handling, describes what you are describing. I found it rather obscure. Also, Michael Kay's book on XSLT has about six pages devoted to white space handling, and at one point it says
"The XML parser will normalize attribute values. A tab or newline will always be replaced by a single space, unless it is written as a character reference..."

Similar Messages

  • How to convert tabs between the words in a String to a single space in java

    I have a string with more number of tabs in between them, the thing I need to do is that I need to convert all those tabs into a single space.
    I used replaceAll(“/t”,” “); but the problem is that it was converting the tabs into spaces but if we have a 3 tabs in between 2 words then this 3 tabs are converting into 3 spaces but I need to have only one space.

    user548049,
    How is your question related to the topic of this forum: Oracle Database-embedded JVM?
    Anyway, you need to use the greedy qualifier: +, like so:
    replaceAll("\\t+"," ")Good Luck,
    Avi.

  • Abap TO CONVERT SPECIAL CHARACTERS TO SPACE

    I have a field in BI "zpustreg" which has values with - and # which is not allowing me to load the data to cube so I am writing this code in transformation to convert any special character to space. but it is having error if you can help me fix the code below would really appreciate it
    <b>Abap Code to Load ZPUSTREG</b>
      METHOD compute_ZPUSTREG.
      IMPORTING
        request     type rsrequest
        datapackid  type rsdatapid
        SOURCE_FIELDS-/BIC/ZPUSTREG TYPE /BIC/OIZCUSTREG
       EXPORTING
         RESULT type tys_TG_1-/BIC/ZPUSTREG
        DATA:
          MONITOR_REC    TYPE rsmonitor.
    $$ begin of routine - insert your code only below this line        -
    ... "insert your code here
    *DATA:Monitor_REC TYPE rsmonitor.
    DATA:L_D_OFFSET LIKE sy-index.
    CONSTANTS:c_allowed(60) TYPE c.
    Value `ABCDEFGHIJKLMNOPQRSTUVWXYZ!"%&'()*+,-/:;<=>?_0123456789_’.
    RESULT = SOURCE_FIELDS-/BIC/ZPUSTREG
    TRANSLATE RESULT TO UPPER CASE
    DO 60 TIMES.
    L_d_offset = sy-index – 1.
    IF RESULT+1_offset(1) CO c_allowed.
    Else.
    RESULT+1_d_offset(1) = ` ’.
    ENDIF.
    ENDDO.
    ENDMETHOD.
    <b>
    E:Statement "VALUE" is not defined. Check your spelling. spelling.
    E:Unable to interpret "C". Possible causes of error: Incorrect spelling</b>
    Thanks
    Soniya

    Hello Soniya
    There is simply a '.' at the wrong place:
    CONSTANTS:c_allowed(60) TYPE c.  "<- wrong, remove
    Value `ABCDEFGHIJKLMNOPQRSTUVWXYZ!"%&'()*+,-/:;<=>?_0123456789_’.
    CONSTANTS:c_allowed(60) TYPE c
    Value `ABCDEFGHIJKLMNOPQRSTUVWXYZ!"%&'()*+,-/:;<=>?_0123456789_’.
    That's it.
    Regards
      Uwe

  • Convert tab separated text to non-trivial xml. (pwsafe -- KeePassx)

    I'd like to take the output of `pwsafe --exportdb > database.txt` and convert it to a KeePassX XML friendly format (feature request in pwsafe).
    I found flat file converter but the syntax is beyond me with this example.  Solutions are welcomed.
    More details
    Here is the pwsafe --> KeePassX XML translations.  The pwsafe export is simply a txt file with 6 fields (the first field can be ignored):
    uuid= doesn't translate
    group= group>title
    name= entry>title
    login= entry>username
    passwd= entry>password
    notes= entry>comment
    Example txt file for conversion (exported from pwsafe):
    # passwordsafe version 2.0 database"
    uuid group name login passwd notes
    "123d9-daf-df-3423423" "retail" "amazon" "myamazonuser" "sjfJ849" "superfluous comment"
    "4599d934-dsfs-324" "retail" "netflix" "netflixuser" "dj3W$#" ""
    "4kdfkd-434-jj" "email" "gmail" "mygmail" "dfkpass" ""
    Example xml in keepassx xml:
    <!DOCTYPE KEEPASSX_DATABASE>
    <database>
    <group>
    <title>Internet</title>
    <entry>
    <title>github</title>
    <username>githubusername</username>
    <password>githubpassword</password>
    <comment>optional comment</comment>
    </entry>
    </group>
    <group>
    <title>retail</title>
    <entry>
    <title>amazon</title>
    <username>username</username>
    <password>myamazonpw</password>
    </entry>
    </group>
    <group>
    <title>retail</title>
    <entry>
    <title>netflix</title>
    <username>username</username>
    <password>mynfxpw</password>
    </entry>
    </group>
    </database>
    Last edited by graysky (2015-06-14 18:27:17)

    It seems that ffe does not work quite well with tab/space separated list (at least I was not able to get it working). However you can use sed to replace tabs with commas:
    sed -e "s/\t/,/g" input.txt | ffe -c pwsafe.rc -s example.txt -l 2>/dev/null
    and this is pwsafe.rc
    structure pwsafe {
    type separated , *
    output group
    quoted
    header first
    record line {
    field uuid * * noprint
    field title
    field title * * entry
    field username * * entry
    field passwordd * * entry
    field comment * * entry
    output noprint {
    data ""
    no-data-print no
    output entry {
    data " <%n>%t</%n>\n"
    no-data-print no
    output group {
    file_header "<!DOCTYPE KEEPASSX_DATABASE>\n <database>\n"
    record_header " <group>\n"
    data " <%n>%t</%n>\n <entry>\n"
    record_trailer " <entry>\n </group>\n"
    file_trailer "</database>\n"
    no-data-print no
    If you are not bound to ffe, maybe it is easier to use a python script which is simpler to modify and it is more flexible. Something like the following should work:
    #!/usr/bin/python
    import sys
    from xml.dom import minidom
    class Converter(object):
    def __init__(self, filename):
    self.url = filename
    def convert(self):
    inp_f = open(self.url, 'r')
    data = inp_f.readlines()
    inp_f.close()
    # xml document model
    doc = minidom.Document()
    root = doc.createElement('database')
    doc.appendChild(root)
    for line in data:
    if '"' not in line:
    continue
    fields = line.split('\t')
    if len(fields) < 6:
    continue
    # uuid = fields[0].strip('"') # unused
    group = fields[1].strip('" ')
    name = fields[2].strip('" ')
    login = fields[3].strip('" ')
    passwd = fields[4].strip('" ')
    notes = fields[5].strip('" \n')
    group_node = doc.createElement('group')
    root.appendChild(group_node)
    # <group>
    group_title_node = doc.createElement('title')
    group_node.appendChild(group_title_node)
    group_title_node.appendChild(doc.createTextNode(group))
    # one <entry> per <group>
    entry_node = doc.createElement('entry')
    group_node.appendChild(entry_node)
    # <entry> -> <title>
    entry_title_node = doc.createElement('title')
    entry_title_node.appendChild(doc.createTextNode(name))
    entry_node.appendChild(entry_title_node)
    # <entry> -> <username>
    entry_uname_node = doc.createElement('username')
    entry_uname_node.appendChild(doc.createTextNode(login))
    entry_node.appendChild(entry_uname_node)
    # <entry> -> <password>
    entry_passwd_node = doc.createElement('password')
    entry_passwd_node.appendChild(doc.createTextNode(passwd))
    entry_node.appendChild(entry_passwd_node)
    # <entry> -> <comments>
    entry_comment_node = doc.createElement('comment')
    entry_comment_node.appendChild(doc.createTextNode(notes))
    entry_node.appendChild(entry_comment_node)
    print('<!DOCTYPE KEEPASSX_DATABASE>')
    print(root.toprettyxml(' ', '\n'))
    if __name__ == "__main__":
    try:
    ifname = sys.argv[1]
    except IndexError:
    print("Input file name required")
    sys.exit(1)
    cc = Converter(ifname)
    cc.convert()
    NOTE: I'm assuming from the example you posted that every <group> node contains only one <entry> child node.
    --edit[0]: corrected a typo in the python code (just realized that it is <comment> instead of <comments ) and made the parsing function more reliable
    Last edited by mauritiusdadd (2015-06-15 09:29:01)

  • Tab characters

    Hello,
    Is there a way to enter tab characters in BB text applications, e.g. Tasks, e-mail, Calendar, etc.?  (I have an 8700c.)
    For example, if I use tab characters in an Outlook Task and sync it with the BB, the tab character appears in the Task item on the BB; I can even scroll through the text and the cursor will jump from the beginning of the tab to the end of it just as it would on a PC.  This shows that tabs are recognized by the BB since they're not replaced with spaces or something else.  But how to type them on the BB directly is the problem...thanks for any help.
    Solved!
    Go to Solution.

    Xandrez: On a Curve 8330, I copy a known tab character, open AutoText, choose tb as the characters to be replaced, and hit paste to paste in the tab character - and it doesn't work. The AutoText entry shows tb ()
    If I copy the known tab character, open Memo Pad, and paste the known tab character into existing text, it doesn't add the tab character. What it does do is this: I type t, which it accepts. When I type b, the b doesn't appear.
    When I edited the AutoText entry to replace tb with a number of spaces, it worked. The entry showed tb (   )
    It appears I'm not copying the tab character. I hold down the shift ket, roll the trackball over one tab character, choose menu | copy. Can you tell where I'm going wrong?
    TIA,
    Pete

  • Itunes is copying/converting 500gb of music from .wma to .m4a files that are on an external drive onto my laptop which doesnt have enough room. how do i reroute these newly converted files somewhere with enough space for them?

    itunes is copying/converting 500gb of music from .wma to .m4a files that are on an external drive onto my laptop which doesnt have enough room. how do i reroute these newly converted files somewhere with enough space for them?
    new to itunes, used windows media player before that and always ripped music directly to an external hardrive and accessed it through the player. now that i downloaded itunes it is taking from the external hardrive and copying a second file for each song onto my lap top hard drive which does not have the capacity for all my music. as itunes converts music files i want them saved back onto my external drive or another location i have space for rather than the lap top. how do i change the setting to move the itunes media folder to another location. assuming that hitting copy and past and dropping it in a random location will cause a few errors.

    When I have done this, all I did was network the two machines and copy the contents of the iTunes folder to the other machine, and that's it.
    My understanding (which may not be 100% correct) is that the one file that is absolutely necessary is iTunes Library, and that the XML file is actually a copy auto-generated from the iTunes Library, appearing in a different format only for non-iTunes apps that take advantage of the iTunes Library data.
    As far as I know, if you simply have the Library (database) file and all your original music files, iTunes on that computer should operate as it did on your old computer. I believe that the Album Artwork, Genius Data, and XML files can be regenerated from the Library file. Not sure about the Extras file.

  • IDOC Data record is appending with NULL characters instead of spaces.

    Hi Gurus,
    1)     We have created a port with Japanese characters for MATMAS05 (IDOC type) and trying to download an IDOC into an XML file using the ADAPTER, the actual data is less than the length of the IDOC string so we need to append the remaining spaces to each data record which in turn fills the segment pad but whereas in NON-UNICODE server the data record is appending with NULL characters instead of spaces.
    2)     For Japanese port the receiver port name in XML file is appearing with some junk characters in NON-UNICODE client, whereas in UNICODE client it is displaying the correct port name with Japanese characters.
    Your help will be appreciated.
    Thanks in Advance.

    ORA-06512 indicates a numeric or value-error line 2 seems to show to the first statement.
    Check the datatypes of your columns/items.
    Try to issue an update manually in SQL*Plus to see if it works generally.

  • How to reject tab characters in EDIFACT (EDIEL) 10.1.2.3

    We get tab characters in alphanumeric fields in some EDIFACT (EDIEL) messages, which should not be supported. The messages are forwarded to an external partner who rejects them. We would like to reject them when we (B2B) receive them instead.
    Where do we control this? I have checked $ORACLE_HOME/ip/oem/edifecs/XEngine/config/actions-edifact.ecs, but I can't find tab as a legal character for any of the encodings (UNOA, UNOB, UNOC) that we use, so it should be rejected.

    Hi,
    Try Launching the FDM with url something like http://fdmservername/HyperionFDM---- which is in 9.3 version.
    You can also try something like logon to server where fdm is installed and open iis(inetmgr [command]) and expand websites and try browsing the FDM website by right clicking on FDM website and browse.
    it will try launching FDM login page with a url being displayed at the bottom on inetmgr window. ex:http://localhost/HyperionFDM.
    Thanks
    Amith

  • To convert special characters to English characters.

    Is there any functional module to convert special characters(Latin) to English characters?

    Hi Meera,
         Welcome To SDN!!
    try using the function module 'SCP_REPLACE_STRANGE_CHARS'.
    Give the variable with special characters in intext.
    Regards
    Kiran Sure

  • New fields for Attributes Data Tab  in Org Model

    Hi, Guru's
    I want to add/remove a new fields for Org model --->Attributes data tab
    Diivision
    Currency
    region
    dist.chnl
    like that I want to add another new XXXX and respected values
    How it would be done.
    Regards
    CR Gupta

    Thank you.
    I didn't get get the final Object
    Can any one give solution.
    system automatically generating the Table = HRT1222 and Field = LOW,
    But
    I need to maintain some values and to be select one from that list.
    How can I overcome ?
    Thank You.
    Edited by: CR.Gupta on Oct 1, 2011 7:53 PM

  • Converting String Characters into Regular Expression Automatically?

    Hi guys.... is there any program or sample coding which is available to convert string characters into regular expression automatically when the program is run?
    Example:
    String Character Input: fnffffffffffnnnnnnnnnffffffnnfnnnnnnnnnfnnfnfnfffnfnfnfnfnfnnnnd
    When the program runs, it automatically convert into this :
    Regular Expression Output: f*d

    hey guys.... i am sorry for not providing all the information that you guys need as i was rushing off to urgent meeting... for my string characters i only have a to n.. all these characters are collected from sensors and stored inside database... from many demos i have done... i found out that every demo has different strings of characters collected and these string of characters will not match with the regular expressions that i had created due to several unwanted inputs and stuff... i have a lot of different types of plan activities and therefore a lot of regular expressions.... if i put [a-z|0-9]*... it will capture all characters but in the same time it will be showing 1 plan only.... therefore, i am finding ways to get the strings i collected and let it form into regular expression by themselves in the program so that it will appear as different plans as output with comparing with the regular expression that i had created.... is there any way to do so?
    please post again if there is any questions u are still not familiar with... thank you...

  • RSAT and the missing Attribute Editor tab [solution]

    PROBLEM
    When you install RSAT on a Vista workstation or Server 2008 system, that is managing a 2000/2003 based forest, you do not see the Attribute Editor tab when looking at the properties of a User or Computer object in Active Directory Users and Computers(ADUC).
    MORE INFORMATION
    The Display Specifier is not updated in the Configuration Naming context, because the 2008 schema changes have not been executed on the forest. Part of the upgrade updates the forest Display Specifiers.  The Attribute Editor tab actually uses functions within the ADSIEDIT tool , more specifically the ADSIEDIT.DLL extension. Although the DLL is probably registered on the RSAT system, the ConfigNC need updating, in order to expose the tab in the ADUC interface.
    SOLUTION
    Use the ADSIEDIT tool (or other tool of choice...ADexplorer, LDP etc), with a user who has rights to modify the Configuration Naming Context.
    Navigate to cn=<languagepage>, cn=configuration, dc=<domainname>
                 (where <languagepage> is your relevant language...see http://support.microsoft.com/kb/324097)
                 (where <domainname> is your domain dn)
    Under the cn=User-Display object, edit AdminPropertyPages and add the line 11,{c7436f12-a27f-4cab-aaca-2bd27ed1b773}
    Under the cn=Computer-Display object, edit AdminPropertyPages and add the line 12,{c7436f12-a27f-4cab-aaca-2bd27ed1b773}
    Under the cn=Default-Display object, edit AdminPropertyPages and add the line 4,{c7436f12-a27f-4cab-aaca-2bd27ed1b773}
    Open ADUC from the Vista or Server 2008 and you should now see the Attribute Editor for the object you selected.  Note: Attribute editor is only shown when ADUC is in Advanced View.
     -Stuart Hudman

    Never mind, i got it.... the prefix number is irrelevant as long as it is in sequential order... to be able to see the attribute editor tab for groups i just
    Under the cn=group-Display object,
    edit AdminPropertyPages and
    add the line 5,{c7436f12-a27f-4cab-aaca-2bd27ed1b773}
    the first number (5,) was my next available number.
    FYI: English is 409. the actual DN would be cn=409,
    cn=DisplaySpecifiers, cn=configuration, dc=<domainname>

  • Alogrithm for converting Unicode characters to EBCDIC

    I would like to know if there is any algorithm for converting Unicode Characters to EBCDIC.
    Awaiting your replys
    Thanks in advance,
    Ravi

    I would like to know if there is any algorithm for
    converting Unicode Characters to EBCDIC.Isn't ECBDIC a 7-bit code like ASCII. Unicode is
    16-bit. This means there is no way Unicode can be
    mapped on ECBDIC without loss of information. Link to
    Unicode,
    No. That is like saying that since UTF-8 is 8 bit based then it can't be mapped to UTF-16. But it does.
    EBCDIC either directly supports or has versions which support multibyte character sets. A multibyte character set can encode any fixed format sized character set. The basic idea is the same way UTF-8 works.
    Multibyte character sets have the added benifit that most of the data in the world is from the ASCII character set and the encodings always support that using only 8 bits. Thus the memory savings over UTF-16 (or UTF-32) are significant.

  • After Effects error: could not convert Unicode characters. (23 :: 46)

    Hello,
    I'm getting the following error message:
    After Effects error: could not convert Unicode characters. (23 :: 46)
    I have yet to find an answer that works to resolve this problem. I'm using CS6 on an HP Z220 on Windows 7.
    Thanks in advance.

    So I solved the problem. A little history for this situation: I created a new AE project and while attempting to import a file received the error message:  After Effects error: could not convert Unicode characters. (23 :: 46)
    I then tried to import a Vanishing Point which broken the spell on the error message and allowed the menu to select a vanishing point to appear. I closed out of that and was then able to import files.

  • Windows 2003 Active Directory Attribute Editor Tab

    My Active Directory does not have an Attribute Editor Tab....how do I add it?

    My Active Directory does not have an Attribute Editor Tab....how do I add it?
    Bradheld is correct, attribute editor tab was introduced in windows 2008. To view the attribute editor tab from vista/windows 2008 & above for 2000/2003 forest, refer below article.
    http://social.technet.microsoft.com/Forums/windowsserver/en-US/6e6ef6bd-b5c9-4f16-b346-097832e3b93c/rsat-and-the-missing-attribute-editor-tab-solution?forum=winserverManagement
    Awinish Vishwakarma - MVP
    My Blog: awinish.wordpress.com
    Disclaimer This posting is provided AS-IS with no warranties/guarantees and confers no rights.

Maybe you are looking for

  • Deleting overlapping request from DSO

    Hi there! Is there any function to delete overlapping requests from a DSO in process chain? For InfoCube there is a step, but how about dso? I'm using both delta and full load method in the same dso daily and I want to delete full load requests only

  • BPEL DB adapter

    after creatinga bpel process which is using a dataBase adpater I get the following exception when trying to test the process from the em: Non Recoverable System Fault: faultName: ((http://schemas.oracle.com/bpel/extension) bindingFault) parts: ((summ

  • Blurb print from Lightroom seems to be 150 ppi, how can I fix it?

    I uploaded a small book directly from Lightroom to Blurb and when I received it, the print quality was atrocious. I've been going back & forth with Blurb support, but they don't seem to be able to answer my question. I did a bit of troubleshooting on

  • Ipod displaying a monochrome non-animated version of the battery icon...

    I think my ipod's battery is depleted and when i connect it to the USB port in my computer, it shows a monochrome non-animated version of the battery icon...the troubleshooter says that the ipod will turn on in abt 30 minutes...however i have waited

  • Releted to h\w and s\w  configuration for SAP B1

    HI  please tell me exat hardware and software configuration for SAP B1 becouse  i am going to implement on (bled server) so is it     possible? please replay  with  configuration regards  sandip