ASCII control characters in XML

hi,guys
I want to export InDesign content to an XML file using javascript.
The problem is, except for TAB, LF, and CR the control characters (those below ASCII 32) are not allowed in well-formed XML and any parser worth its salt will puke.
Adobe (mis)uses some ASCII control characters for their own purposes; e.g. "End nested format here" becomes ASCII 3 (ETX - end of text).
Now how can i export the well-formed XML?
Although i can replace these characters, but i don't know what should i replace it to correctly, and can be used it correctly in my web page.

You know there is an Indesign Scripting Forum, too?
http://forums.adobe.com/community/indesign/indesign_scripting

Similar Messages

Converting control characters to spaces in a Unicode program?

I want to take an ASCII character string and convert any
ASCII Control Characters to Spaces.
In a non-Unicode program, I define the following hex constant:
CONSTANTS: c_control_to_space(64) TYPE x VALUE
'00200120022003200420052006200720082009200A200B200C200D200E200F20' &
'10201120122013201420012016201720182019201A201B201C201D201E201F20'.
I then execute the following TRANSLATE statement:
      TRANSLATE w_transcript USING c_control_to_space.
What would be the "approved" method of accomplishing the same effect
in a Unicode program?

Neil,
First, thank you for pointing out my typo. You are correct that the "0120" in the second line of the literal was intended to be "1520".
Second, thank you for your suggestion. Based on your idea, I tried something similar, but not exactly what you suggested. In particular, since I can't figure out how to construct the constant that I want, I used your idea to construct it as a variable, as follows:
DATA number TYPE i.
DATA offset TYPE i.
DATA hex(4) TYPE x.
FIELD-SYMBOLS <char> TYPE c.
ASSIGN hex TO <char> CASTING TYPE c.
DATA w_control_to_space(64) TYPE c.
DO 32 TIMES.
    hex = sy-index - 1.
    offset = 2 * ( sy-index - 1 ).
    number = STRLEN( <char> ).
    IF number GT 1.
      SUBTRACT 1 FROM number.
      SHIFT <char> LEFT BY number PLACES.
    ENDIF.
    w_control_to_space+offset(1) = <char>.
ENDDO.
After having constructed "w_control_to_space", I can now use the TRANSLATE statement:
TRANSLATE w_transcript USING w_control_to_space.
This code passes the Unicode syntax checks and works correctly on a non-Unicode system. I don't have access to a Unicode system on which to run it. I'd appreciate any feedback on this approach - especially if someone can actually test it on a Unicode system.

Wrong ASCII values for control characters in Variables and Stack Call in CVI2013?

Hi,
I think there is an error in "Variables and Call Stack" window if you want to look for your variables in ASCII format.
The control characters (0 - 31) are not shown correct. They are shifted 2.
For example:
Character in Decimal format is 10 (LF) but when you are chancing to ASCII format it is showing \012.
The same with 13 (CR). This character is \015 in ASCII format.
I think that was no problem in CVI2012.
Best regards
Gunther
Solved!
Go to Solution.

I'm not using CVI2013 yes so I cannot respond regarding this specific product, but the code you are showing are the octal equivalent of the decimal value you specified: it could be that control characters (or generally speacking non-printable ones) are replaced with their octal equivalent in string view.
Proud to use LW/CVI from 3.1 on.
My contributions to the Developer Zone Community
If I have helped you, why not giving me a kudos?

Replace control characters in IDoc

Hello!
In an IDoc -> XI -> IDoc scenario we have messages failing with
com.sap.engine.lib.xml.parser.ParserException: Invalid char #0x0(:main:, row:1, col:5446)
Once I opened the XML using a HEX editor and had a look at the row 5446 it shows several 0x0 values as mentioned in the error message in SXMB_MONI.
Characters with a hexcode less than 0x20 are non-printable control characters which are not allowed to appear in the text of an XML tag according to the XML specification.
However, I'm wondering why the IDoc adapter does not delete or replace those control characters with corresponding XML entities before it reaches the Integration Engine.
The control records are entered by business users in SAP once they fill out free text fields. Any ideas, what could be done on SAP or XI side to automatically prevent those parsing errors and have the message processing successfully? Thanks.
Regards, Tanja

Hello!
Are u using "Apply Control Record Values from Payload" in the idoc receiver communciation channel
Yes, we have the "Apply Control Record Values from Payload" indicator turned on in the receiver IDoc adapter as we need to have the IDoc control record fields filled from the IDoc XML payload.
Regards, Tanja

How to remove special characters in xml

Dear friends,
How to remove the special character from the xml. I am placing the xml file and fetching through file adapter.
The problem is when there is any special character in xml. i am not able to pass to target system smoothly.
Customer asking schedule the file adapter in order to do that the source xml should not have any special charatcters
How to acheive this friends,
Thanx in advance.
Take care

Hi Karthik,
Go throgh the following links how to handle special character
https://www.sdn.sap.com/irj/scn/weblogs?blog=/pub/wlg/9420 [original link is broken] [original link is broken] [original link is broken]
https://www.sdn.sap.com/irj/sdn/go/portal/prtroot/docs/library/uuid/502991a2-45d9-2910-d99f-8aba5d79fb42
Restricting special characters in XML within XI..
Regards
Goli Sridhar

? is shown for Norwegian characters when XML document is parsed using DOM

Hi,
I've a sample program that creates a XML document with a single element book having Norwegian characters. Encoding is UTF-8. When i parse the XML document and try to access the value of that element then ? are shown for Norwegian characters. XML document file name is "Sample.xml"
            DocumentBuilderFactory dbfac = DocumentBuilderFactory.newInstance();
            DocumentBuilder docBuilder = dbfac.newDocumentBuilder();
            Document doc = docBuilder.newDocument();
            Element root = doc.createElement("root");
            root.setAttribute("value", "Á á Ą ą ä É é Ę");
            doc.appendChild(root);
            TransformerFactory transfac = TransformerFactory.newInstance();
            Transformer trans = transfac.newTransformer();
            trans.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
            trans.setOutputProperty(OutputKeys.INDENT, "yes");
            trans.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
            //create string from xml tree
            java.io.ByteArrayOutputStream baos = new java.io.ByteArrayOutputStream();
            StreamResult result = new StreamResult(baos);
            DOMSource source = new DOMSource(doc);
            trans.transform(source, result);
            writeToFile("Sample.xml", baos.toByteArray());
            InputSource is = new InputSource(new java.io.ByteArrayInputStream(readFile("Sample.xml")));
            is.setEncoding("UTF-8");
            DocumentBuilder obj_db = DocumentBuilderFactory.newInstance().newDocumentBuilder();
            Document obj_doc = obj_db.parse(is);
            obj_doc.normalize();
            System.out.println("Value is : " + new String(((Element) obj_doc.getElementsByTagName("root").item(0)).getAttribute("value").getBytes()));writeFile() - Writes the document bytes in Sample.xml file
readFile() - Reads the Sample.xml file
When i run this program XML editor shows the characters correctly but Java code output is: Á á ? ? ä É é ?
What's the problematic area in my java code. I didn't get any help from any source. Please suggest me the solution of this problem.
Thanx in advance.

Hi,
I'm using JBuilder 2005 and i mentioned encoding UTF-8 for saving Java source files and also for compilation. I've modified my source code also. But the problem persists. After applying changing the dumped sample.xml file doesn't display these characters correctly in IE, but earlier it was displaying it correctly at IE.
Modified code is:
DocumentBuilderFactory dbfac = DocumentBuilderFactory.newInstance();
            DocumentBuilder docBuilder = dbfac.newDocumentBuilder();
            Document doc = docBuilder.newDocument();
            Element root = doc.createElement("root");
            root.setAttribute("value", "Á á Ą ą ä É é Ę");
            doc.appendChild(root);
            OutputFormat output = new OutputFormat(doc, "UTF-8", true);
            java.io.ByteArrayOutputStream baos = new java.io.ByteArrayOutputStream();
            OutputStreamWriter osw = new OutputStreamWriter(baos, "UTF-8");
            XMLSerializer s = new XMLSerializer(osw, output);
            s.asDOMSerializer();
            s.serialize(doc);
            writeToFile("Sample5.xml", baos.toByteArray());
            InputSource o = new InputSource(new java.io.ByteArrayInputStream(readFile("Sample5.xml")));
            o.setEncoding("UTF-8");
            com.sun.org.apache.xerces.internal.parsers.DOMParser obj_parser = new com.sun.org.apache.xerces.internal.parsers.DOMParser();
            obj_parser.parse(o);
            Document obj_doc = obj_parser.getDocument();
            System.out.println("Value : " + new String(((Element) obj_doc.getElementsByTagName("root").item(0)).getAttribute("value").getBytes()));I'm hanged on this issue. Can u please provide me the code snippet that works with these characters or suggest solution.
Thanx

AppleScript puts control characters in txt file, inflating it enormously

Hi all,
I've come across a strange problem. I have an AppleScript file that puts up a dialog box at regular intervals asking me to write down what I am doing (to create an activity log). The file is saved to my Documents folder. This has been working just fine for several months (years), and now, all of a sudden, AppleScript fails on me. The reason is, the log file is inflated out of proportions because it contains hidden control characters (gremlins) between each letter and zillions of them between words and lines. Opening one of these files makes TextEdit go crazy ('Application not responding'). I discovered these gremlins when I opened the document in Classic mode with Word 5 (the best application ever to have been produced by MS, which allows me to remove them and make the file palatable to TextEdit again).
But that doesn't solve the problem: Every entry continues to be inserted with another load of gremlins, and I can't understand why. I deleted preferences both for TextEdit and AppleScript, and I performed regular disk maintenance.
TextEdit is set to save plain text files, end-of-line is Macintosh style, character set is Western Mac OS Roman. It's always been like this before when it worked ok.
Any ideas?
Thanks in advance.
G4 MDD 1.25 GHz, 768 MB RAM Mac OS X (10.3.9)

The solution to my problem can be found in the AppleScript forum here:
http://discussions.apple.com/message.jspa?messageID=2353871
With many thanks to reese_, who provided the solution, and to Tom, who directed me there!

Control Characters in Data

I've created a simple table with 2 rows. When I query the table for an individual row (where id = 1), my return set has control characters in between 2 columns...sometimes.
If I select * from the table, I get a core dump, then a Oracle sqlplus error message that wants to send a report to Microsoft. Once I decide to send/not send the message, sqlplus blows away.
These 2 rows are actually 2 rows in a much larger table. I encountered this problem in the larger table, thought my data was corrupted, so I created the exact same table, in a totally different schema, and pasted the data in the new table. When I received the same error, I dropped the table, recreated it, and hand entered all the data, with the same results.
Table structure:
id NUMBER(9) PRIMARY KEY
TABLE_ID NUMBER(9)
FILTER_ONE VARCHAR2(800)
FILTER_TWO VARCHAR2(100)
CLASS_FILTER VARCHAR2(1)
If I select* FROM test_table WHERE ID = 27, my result set looks like this:
'blah blah (' ☻ '))' ☺ 'Y'
The 1st open parenthesis is the end of the filter_one column, the double parenthesis is the entire filter_two, and the Y is the class filter. The control characters come from I know not where, but my app cannot function - this is a dynamically built query and the query of course fails because sqlplus can't interpret the control characters.
Anyone ever heard of this? I should also mention that this query has worked for the past year, and has not been changed - it just suddenly stopped working.
Thanks!
Dave

Sure -
CREATE TABLE TEST (
ID NUMBER(9) PRIMARY KEY,
TABLE_ID NUMBER(9),
FILTER_ONE VARCHAR2(800),
FILTER_TWO VARCHAR2(100),
CLASS_FILTER VARCHAR2(1) )
INSERT INTO test VALUES(
27,38,'where id in (select decode(pci.config_comp_int_id,null,decode(pci.component_id,null,pci.spirit_comp_id, pci.component_id),
decode(cci.component_id,null,cci.component_id, cci.spirit_comp_id)) from c3net_units u
join (select uc.*,connect_by_root(unit_id) top_unit_id
from c3net_unit_config_int uc start with uc.unit_id is not null connect by prior uc.id = uc.parent_id) uci
on (u.id = uci.top_unit_id)
left join (c3net_unit_plat_int upi join c3net_plat_comp_int pci on
( upi.id = pci.unit_plat_int_id)) on ( uci.id = upi.unit_config_int_id)
left join (c3net_config_plat_int cpi join c3net_config_comp_int cci on (cpi.id = cci.config_plat_int_id))
on (uci.config_id = cpi.config_id) where u.version_id in (',
'))','Y');
INSERT INTO test VALUES(
28,39,'where id in (select decode(upi.config_plat_int_id,null,decode(upi.platform_id,null,upi.spirit_plat_id,
upi.platform_id),
decode(cpi.platform_id,null,cpi.spirit_plat_id, cpi.platform_id)) from c3net_units u
join (select uc.*,connect_by_root(unit_id) top_unit_id
from c3net_unit_config_int uc start with uc.unit_id is not null connect by prior uc.id = uc.parent_id) uci
on (u.id = uci.top_unit_id)
left join c3net_unit_plat_int upi on ( uci.id = upi.unit_config_int_id)
left join c3net_config_plat_int cpi on (uci.config_id = cpi.config_id) where u.version_id in (',
'))','Y');
When I then "select * from test where id = 27;" I get strange data - looks like control characters - in the filter_one and filter_two fields
When I change the id to 28 and re-query, it gives me the core dump and sqlplus error.
Thanks for your time guys

Hidden control characters in file generated by AppleScript?

Hi all,
I've come across a strange problem. I have an AppleScript app (kindly produced for me by a member here) that gets called up by cron, which puts up a dialog box at regular intervals asking me to write down what I am doing (to create an activity log). The log is saved to my Documents folder if it doesn't already exist; otherwise my typing will just get time-stamped and added to the file.
This has been working just fine for several months (years), and now, all of a sudden, AppleScript konks out on me -- it simply crashes. The reason, I discovered, is the log file itself. It gets inflated out of proportions (something like 2.5 Megs where 40K would suffice) because it contains hidden control characters (gremlins) between each letter and zillions of them between words and lines.
When the script opens this inflated file, TextEdit balks ('Application not responding'), which in turn crashes AppleScript.
I discovered these gremlins when I opened the document in Classic mode with Word 5, which allows me to remove them and thus make the file smaller and therefore palatable to TextEdit again.
But that doesn't solve the problem: Every entry continues to be inserted with another load of gremlins, and I can't understand why. I deleted preferences both for TextEdit and AppleScript, and I perform regular disk maintenance.
TextEdit is set to save plain text files, end-of-line is Macintosh style, character set is Western Mac OS Roman. It's always been like this before when it worked ok.
Any ideas?
Thanks in advance.
G4 MDD 1.25 GHz, 768 MB RAM Mac OS X (10.3.9)

Hi Camelot,
The script doesn't specify TextEdit -- it just creates/updates a text file which I then read with TextEdit. Here's the script, and below is the link to the test log:
--STARTOFSCRIPT------------------------------------------
-- Find out if the log file is actually there.
-- We can't get the last entry of a file that doesn't exist.
set fileExists to false
tell application "Finder"
if exists file "MacHD:Users:gisela:Documents:Gisela'sActivityLog.txt" then set fileExists to true
end tell
-- Get the last entry in the log file to present to the user.
-- If the file wasn't there, present a default choice to the user.
if fileExists then
set myLogFile to open for access ((path to documents folder as text) & "Gisela'sActivityLog.txt")
set logFileContents to read myLogFile using delimiter return
close access myLogFile
set lastLogEntry to last item of text items of logFileContents
set oldTIDs to AppleScript's text item delimiters
set AppleScript's text item delimiters to tab
set lastEntry to text item 2 of lastLogEntry
set AppleScript's text item delimiters to oldTIDs
else
set lastEntry to "Nothin' much."
end if
-- Ask the user what they're doing.
set myLogEntry to text returned of (display dialog "So... What're you doing?" default answer lastEntry buttons {"OK"} default button "OK" with icon note)
-- Get the date and time via "Do Shell Script" (for me, easier than mucking with AppleScript's date results).
-- Modified order to Year Month Day# Weekday hour:min:sec [Gisela]
set dateUnix to do shell script "date"
set dateText to word 8 of dateUnix & " " & word 2 of dateUnix & " " & word 3 of dateUnix & " " & word 1 of dateUnix & " " & word 4 of dateUnix & ":" & word 5 of dateUnix & ":" & word 6 of dateUnix & " " & tab
-- Open a log file on the desktop. The file is created if it doesn't exist.
set myLogFile to open for access ((path to documents folder as text) & "Gisela'sActivityLog.txt") with write permission
-- Write a log entry into the file.
write dateText & myLogEntry & return to myLogFile starting at eof
-- Close the log file.
close access myLogFile
-- Script by Bryan K. Vines, Corpus Christi, TX, via Apple Discussion Forum 22/5/04 and 24/5/04
--ENDOFSCRIPT------------------------------------------
Here's the test log => http://www.webalice.it/gisela/TestLog.txt. Had to change the name because my webspace doesn't like apostrophes (').
This phenomenon only occurs in connection with this Activity log script. My gut tells me it's not AppleScript's fault, but I can't figure out where else to look.
Kind regards,
Gisela

Removing the Control Characters from a text file

Hi,
I am using the java.util.regex.* package to removing the control characters from a text file. I got below programming from the java.sun site.
I am able to successfully compile the file and the when I try to run the file I got the error as
------------------------------------------------------------------------D:\Debi\datamigration>java Control
Exception in thread "main" java.util.regex.PatternSyntaxException: Illegal repet
ition
{cntrl}
at java.util.regex.Pattern.error(Pattern.java:1472)
at java.util.regex.Pattern.closure(Pattern.java:2473)
at java.util.regex.Pattern.sequence(Pattern.java:1597)
at java.util.regex.Pattern.expr(Pattern.java:1489)
at java.util.regex.Pattern.compile(Pattern.java:1257)
at java.util.regex.Pattern.<init>(Pattern.java:1013)
at java.util.regex.Pattern.compile(Pattern.java:760)
at Control.main(Control.java:24)
Please help me on this issue.
Thanks&Regards
Debi
import java.util.regex.*;
import java.io.*;
public class Control {
public static void main(String[] args)
throws Exception {
//Create a file object with the file name
//in the argument:
File fin = new File("fileName1");
File fout = new File("fileName2");
//Open and input and output stream
FileInputStream fis =
new FileInputStream(fin);
FileOutputStream fos =
new FileOutputStream(fout);
BufferedReader in = new BufferedReader(
new InputStreamReader(fis));
BufferedWriter out = new BufferedWriter(
new OutputStreamWriter(fos));
// The pattern matches control characters
Pattern p = Pattern.compile("{cntrl}");
Matcher m = p.matcher("");
String aLine = null;
while((aLine = in.readLine()) != null) {
m.reset(aLine);
//Replaces control characters with an empty
//string.
String result = m.replaceAll("");
out.write(result);
out.newLine();
in.close();
out.close();

Hi,
I used the code below with the \p, but I didn't able to complie the file. It gave me an
D:\Debi\datamigration>javac Control.java
Control.java:24: illegal escape character
Pattern p = Pattern.compile("\p{cntrl}");
^
1 error
Please help me on this issue.
Thanks&Regards
Debi
// The pattern matches control characters
Pattern p = Pattern.compile("\p{cntrl}");
Matcher m = p.matcher("");
String aLine = null;

Query regarding Handling Unicode characters in XML

All,
My application reads a flat file in series of bytes, I
create a XMl document out of the data. The data contains Unicode characters.
I use a XSLT to create XML file. While creating it I don't face any issues
but later if I try to parse the constructed XMl file, i get a sax parsing exception
(Caused by: org.xml.sax.SAXParseException: Character reference _"<not visible clearly in Browser>"_ is an invalid XML character.)
Can some one advice on how to tackle this.
regards,
D
Edited by: user9165249 on 07-Jan-2011 08:10

How to tackle it? Don't allow your transformation to produce characters which are invalid in XML. The XML Recommendation specifies what characters are allowed and what characters aren't, in section 2.2: http://www.w3.org/TR/REC-xml/#charsets. The invalid characters can't come from the XML which you are transforming so they must be coming from code in your transformation.
And if you can't tell what the invalid characters are by using your browser, then send the result of the transformation to a file and use a hex editor to examine it.
By the way, this isn't a question about Unicode characters in XML, since all characters in Java are Unicode and XML is defined in terms of Unicode. So saying that your data contains Unicode characters is a tautology. It couldn't do anything else. If your personal definition of Unicode is "weird stuff that I don't understand" then do yourself a favour, take a couple of days out and learn what Unicode is.

Web services and control characters

Hi,
We are using JAXWS and JAXB to create web services. We have a problem with our current data because it sometimes contains "bad" characters, such as control characters. Is there a nice way for us to remove these characters when creating the messages or when retrieving the data from the database? We use java persistence / hibernate to retrieve the data from the database.
I would prefer a method that doesn't include having to "clean" each string manually...
Thanks!

hi, i�m doing something like you but in jbuilder that is another IDE, i don�t know if it is useful for you but i entered to the help of jbuilder and i wrote in the index "web services" and then i found a topic called : "export classes as webservices" and in that place i can see some steps to follow, may be in eclipse you can find something like this.

Input stream with control characters

I am using the HTTPRequest to get data from a MySQL varchar(3000) field.
when I read the data I get the following error message.
Exception in thread "AWT-EventQueue-0" (1,670) com.sun.javafx.data.pull.impl.StreamException: Control character is not allowed inside string
The MySQL field does have things like CR's that I need to allow the user to enter.
QUESTION:
How can I capture data with control characters?

The MySQL column type is varchar(3000) basically a string of 3000 characters to my understanding

Remove control characters in txt file (saved from Excel)

Hi,
I have a txt file that contains invisible control characters and I want to remove those characters. I've been thinking of 2 options
1/ Get the content of the file into a string, then go through each character and basically takes only alphanumeric, new lines, Alt+Enter character (character that is created in txt files in Excel that breaks line). With this approach, I'm stuck on getting the character code for Alt+Enter so if anyone could point out. That helps a great deal.
2/ Use some pattern matching {ctrnl} or something to remove all control characters. I've tried this approach and it didn't work for me.
Please help me with this problem. Any help or suggestion is greatly appreciated.

(saved from Excel) Why not save it as csv?
trivektor wrote:
With this approach, I'm stuck on getting the character code for Alt+Enter so if anyone could point out. That helps a great deal.
You can figure that out with a hex editor or just write a small app that prints int values for each byte, not character, and print the file.
Presumably you already found the Character class and its methods.
Edited by: jschell on Sep 22, 2008 4:29 PM

[SOLVED] Why are control characters visible in text files??

Hi,
If I direct the output of 'ls' to a file like:
ls > file.txt
and open the file in a editor such as vi, joe or mousepad, it looks like this:
[[01;34m22x22[[0m
[[01;34m24x24[[0m
[[01;34m32x32[[0m
[[0mindex.theme[[0m
[[01;34mscalable[[0m
[[m
I just did the same thing on an old slackware box and no control characters are visible. The same for a recent LFS build. I realize I could probably pipe through dos2unix, but it shouldn't be happening anyway. Any input is appreciated.
Thanks
-Frank
Last edited by fianella (2007-10-24 10:37:00)

at the DOS prompt (sorry, old habits die hard :-) ) type
alias ls
and see what the results are. Compare that against your slackware or lfs builds. You will probably find that the color= option is different - most builds will not include the color ANSI sequences if you pipe the output, but if it says color=always the color codes will be included in the redirected file.
Assuming you find an alias for ls that forces color output, you need to find where that is taking place... in your ~/.bashrc file? in /etc/profile.d? And change that to alias ls='ls --color=tty' or something sensible like that.

ASCII control characters in XML

Similar Messages

Maybe you are looking for