Character conversion problem

I've written an app that's a wrapper for rsync and one of its functions is to get a listing from the server and see if the files exist on the local machine. My problem is with filenames that have accented characters. �, � etc
On linux it works fine and rsync -r lists the file as:
J�rgHardt.rcd
But on windows it lists like this:
J\303\266rgHardt.rcd
When it creates the files on the disk it uses the correct characters though, so if I do a File.exists() using the above string it doesn't match. So my question is, is it possible to convert the "\303\266" back into an � and if so, how?
I tried URLDecoder.decode, but that sees it as 2 characters and returns "��". I've tried the rsync.exe from cygwin and from cwRsync but they give the same output.
Thanks.

No, the issue is how to convert \303\266 into an �. That's all I need to know.
My point above was that since the files have the correct characters and that they are displayed properly, then obviously windows supports unicode perfectly fine as it is. The cause is that the text output from the rsync.exe isn't displaying the characters, so when I read the stream from it's output I get \303\266.
Modifying windows locale etc isn't an option and nether is modifying the rsync program. All I can do is deal with the text that I get from it.

Similar Messages

  • Character conversion problems when calling FM via RFC from Unicode ECC 6.0?

    Hi all,
    I faced a Cyrillic character convertion problem while calling an RFC function from R/3 ECC 6.0 (initialized as Unicode system - c.p. 4103). My target system is R/3 4.6C with default c.p. 1500.
    The parameter I used in my FM interface in target system is of type CHAR10 (single-byte, obviously).
    I have defined rfc-connection (SM59) as an ABAP connection and further client/logon language/user/password are supplied.
    The problem I faced is, that Cyrillic symbols are transferred as '#' in the target system ('#' is set as default symbol in RFC-destination definition in case character convertion error is met).
    Checking convertions between c.p. 4103  and target c.p. 1500 in my source system using tools of transaction i18n shows no errors - means conversion passed O.K. It seems default character conversion executed by source system whithin the scope of RFC-destination definition is doing something wrong.
    Further, I played with MDMP & Unicode settings whithin the RFC-destination definition with no successful result - perhaps due to lack of documentation for how to set and manage these parameters.
    The question is: have someone any experience with any conversion between Unicode and non-Unicide systems via RFC-call (non-English target obligatory !!!), or can anyone share valuable information regarding this issue - what should be managed in the RFC-destination in order to get character conversion working? Is it acceptable to use any character parameter in the target function module interface at all?
    Many thanks in advance.
    Regards,
    Ivaylo Mutafchiev
    Senior SAP ABAP Consultant

    hey,
    I had a similar experience. I was interfacing between 4.6 (RFC), PI and ECC 6.0 (ABAP Proxy). When data was passed from ECC to 4.6, RFC received them incorrectly. So i had to send trimmed strings from ECC and receive them as strings in RFC (esp for CURR and QUAN fields). Also the receiver communication channel in PI (between PI and  RFC) had to be set as Non unicode. This helped a bit. But still I am getting 2 issues, truncation of values and some additional digits !! But the above changes resolved unwanted characters problem like "<" and "#". You can find a related post in my id. Hope this info helps..

  • Character Converting Problem at File Adapter

    I am having a character conversion problem bringing in a XML file into XI.
    It is converting the data wrong.
    I have a file adapter picking up Audit.xml encoding is utf-8, use binary mode to pick it up.
    XI takes the file and converts it into a text file.
    I compared the hex of Audit.xml source file and the Audit.txt file and the characters are different for a specific few.
    See attached image for details
    Please let me know if any additional information will help
    http://i9.photobucket.com/albums/a68/tkc204/SAP%20XI/Audit.png

    Actually for this case Christian is right the source xml that I am receiving wasn't valid UTF-8. The program generating the xml was just putting the encoding - utf-8 tag on top because that's what the code told it to do.
    The actual data is Windows-1252 in xml. So when XI reads it, it uses the utf-8 tag on top and is expecting UTF-8, but it was getting something else causing the unsupported characters to get converted.
    The HEX was different on the Source XML comparing it to what XI was generating after File Adapter.
    After making the output change to Windows 1252. It changes back to the expect HEX.
    Source - is at Windows 1252
    XI - it will use UTF-8 (no way to change it)
    Target - forced to create in Windows 1252
    Other helpful tools I used: W3C Validator - upload the file and found that I wasn't receiving UTF-8
    I hope this is clear and helps someone else in the future.
    Thanks For everyone's input.
    Chirag

  • Problem in the character conversion

    Hi Guys,
    I am facing problem in the character conversion
    I am posting data from SAP to third party system using XI , by converting whole input message to a String .I am using SOAP adapter to communicate XI to third party system.
    Thirdparty system needs String to be wrapped in CDATA so that it will not choke by looking at the special characters. I did Wrap the output string in CDATA, using ABAP mapping but when I do that XI is converitng  arrow brackets < and >. into &lt and u2018&gtu2019   my assumption is it is double encoding.
    example -
    before map -  <AppSystemInfo>
    after mapping  it is converted as -  <![CDATA[ &ltAppSystemInfo&gt]]>
    Edited by: Vamsi on Jun 17, 2010 10:00 PM
    Edited by: Vamsi on Jun 17, 2010 10:01 PM

    Did you try to see the output?
    bcz if you are trying this in mapping testing it will show you like this as this conversion if for xml, so xml will not do anything wrong with the special characters, so for that special characters will be converted like that.
    Once try to run end to end interface and try to see at receiver side that how data looks like.
    Thanks,
    Hetal

  • Oracle to Mysql character set conversion problem!!! PLZ IGNORE

    Hi Experts,
    I have created a database link from Oracle 10g to Mysql 5.
    I have installed Oracle Gateway 11g for this purpose.
    When i retreive the data from sql plus the text is displayed as question marks.
    Oracle 10g Database character set is WE8MSWIN1252
    Mysql character set --->latin1
    Character set of ODBC connector for mysql is  latin7
    Character set in the parameter file of HS folder is WE8MSWIN1252When i retrieve data from sql developer the text is fine(as i think it directly takes the character set of target) but
    when i login from sqlplus i get question marks!
    I have another post in Heterogeneous Connectivity forum
    Re: Oracle to Mysql character set conversion problem!!! PLZ HELP
    Kindly update your comments there,
    @@@@@@@@@@@@@@2
    Appreciate your help,
    regards
    Edited by: user10243788 on Apr 21, 2010 3:25 AM

    It is OK to post a globalization-related question in this forum in addition to the forum pertaining to the main technology. Not all experts follow all possible forums on OTN. Of course, you should cross-link the posts to let people merge the answers.
    Regarding the problem itself, make sure that SQL*Plus has the right NLS_LANG setting in the environment. On Windows, in the Command Prompt:
    C:\> set NLS_LANG=.WE8PC850
    C:\> sqlplus ...On Unix:
    $ setenv NLS_LANG .WE8ISO8859P1   (or NLS_LANG=.WE8ISO8859P1; export NLS_LANG)
    $ sqlplus ...-- Sergiusz

  • Printing ZPL (Zebra) data to printer spooler without character conversion

    Hi all,
    We are printing shipping labels from UPS, with a process where we recive the ZPL label code directly from UPS, and we just need to pass the data to the printer to get the labels. We have already implemented this with Fedex and some custom labels, and it works perfectly. The problem with the UPS label data is that it contains non-printable characters (in the MaxiCode data field). When passed to the SAP printer spooler (see code example below), the data gets corrupted because SAP interprets these non-printable characters as printer control codes.
    I have verified this by saving the ZPL data to a local file, before printing it through the SAP spooler. I then print this raw data and compare the output with the labels printed from the spooler. The MaxiCode (the big 2D barcode) is different in these labels. UPS has also tested the labels, and rejected them because of incorrect data in the barcode.
    For printing, we are using printers defined as type "PLAIN", but I also tried using the "LZEB2" device type with the same result. The error we see in the spooler entry is this:
    Print ctrl S_0D_ is not defined for this printer. Page 1, line 2, col. 2201
    Print output may not be as intended
    The printer ctrl code differs, depending om the label. I have examined the spooler data in "raw" mode, and there is always an ASCII character 28 (hex 1C) in front of the characters that SAP think are control codes, and this is why I think these non-printable characters are the reason for the problems.
    This is the function module I use to print the ZPL data (and as stated above, this works fine for Fedex and custom labels). The ZPL data is converted to binary format before passed to the function module, but I also tried to send the data in text format with another FM, but the result is the same. I have experimented with the "codepage" parameter, and this one gives the least amount of errors, and some labels actually get through without errors. But still at least 50% of the labels gets corrupted, with log entries like above.
    CALL FUNCTION 'RSPO_SR_WRITE_BINARY'
          EXPORTING
            handle           = lv_spool_handle
            data             = lv_label_line_bin
            length           = lv_len
            codepage         = '2010'
          EXCEPTIONS
            handle_not_valid = 1
            operation_failed = 2
            OTHERS           = 3.
    Does anyone know if there is a way to send data to the spooler without character conversion or interpretation of printer control codes? Or is there any other smart way to get around this problem?
    /Leif

    I do a more direct output to the spooler, to avoid any issues with the WRITE statement and SAP's report output processing. At the same time, I insert line breaks so that the output is easy to debug in the spooler if needed. Also included is the code to to detect the escape code (ASCII #28) and to insert a control code ZZUPS in its place (you can skip this for Fedex). Here's a simplified example, but please note this is for a Unicode system, some minor changes is required in a non-Unicode system.
    CONSTANTS: lc_spcode TYPE c LENGTH 5 VALUE 'ZZUPS',
               lc_xlen TYPE i VALUE 5.
       DATA: lv_print_params TYPE pri_params,
             lv_spool_handle TYPE sy-tabix,
             lv_name TYPE tsp01-rq0name,
             lv_spool_id TYPE rspoid,
             lv_crlf(2) TYPE c,
             lv_lf TYPE c,
             lstr_label_data TYPE zship_label_data_s,
             lv_label_line TYPE char512,
             lv_label_line_bin TYPE x LENGTH 1024,
             lv_len TYPE i,
             ltab_label_data_255 TYPE TABLE OF char512,
             ltab_label_data TYPE TABLE OF x,
             lv_c1 TYPE i,
             lv_c2 TYPE i,
             lv_cnt1 TYPE i,
             lv_cnt2 TYPE i,
             lv_x(2) TYPE x.
       FIELD-SYMBOLS: <n> TYPE x.
       lv_crlf = cl_abap_char_utilities=>cr_lf.
       lv_lf = lv_crlf+1(1).
       lv_name = 'ZPLLBL'.
    CALL FUNCTION 'RSPO_SR_OPEN'
         EXPORTING
           dest                   = i_dest
           name                   = lv_name
           prio                   = '5'
           immediate_print        = 'X'
           titleline              = i_title
           receiver               = sy-uname
    *      lifetime               = '0'
           doctype                = ''
         IMPORTING
           handle                 = lv_spool_handle
           spoolid                = lv_spool_id
         EXCEPTIONS
           device_missing         = 1
           name_twice             = 2
           no_such_device         = 3
           operation_failed       = 4
           OTHERS                 = 5.
       IF sy-subrc <> 0.
         RAISE spool_open_failed.
       ENDIF.
    LOOP AT i_label_data INTO lstr_label_data.
         CLEAR ltab_label_data_255.
         SPLIT lstr_label_data-label_data AT lv_lf INTO TABLE ltab_label_data_255.
         LOOP AT ltab_label_data_255 INTO lv_label_line.
           IF lv_label_line NE ''.
             lv_len = STRLEN( lv_label_line ).
    *       Convert character to hex type
             lv_c1 = 0.
             lv_c2 = 0.
             DO lv_len TIMES.
               ASSIGN lv_label_line+lv_c1(1) TO <n> CASTING.
               MOVE <n> TO lv_x.
               IF lv_x = 28.
                 lv_cnt1 = 0.
                 lv_label_line_bin+lv_c2(1) = lv_x.
                 lv_c2 = lv_c2 + 1.
                 DO lc_xlen TIMES.
                   ASSIGN lc_spcode+lv_cnt1(1) TO <n> CASTING.
                   MOVE <n> TO lv_x.
                   lv_cnt2 = lv_c2 + lv_cnt1.
                   lv_label_line_bin+lv_c2(2) = lv_x.
                   lv_c2 = lv_c2 + 2.
                   lv_cnt1 = lv_cnt1 + 1.
                   lv_len = lv_len + 1.
                 ENDDO.
               ELSE.
                 lv_label_line_bin+lv_c2(2) = lv_x.
                 lv_c2 = lv_c2 + 2.
               ENDIF.
               lv_c1 = lv_c1 + 1.
             ENDDO.
    *       Print binary data to spool
             lv_len = lv_len * 2. "Unicode is 2 bytes per character
             CALL FUNCTION 'RSPO_SR_WRITE_BINARY'
               EXPORTING
                 handle                 = lv_spool_handle
                 data                   = lv_label_line_bin
                 LENGTH                 = lv_len
               EXCEPTIONS
                 handle_not_valid       = 1
                 operation_failed       = 2
                 OTHERS                 = 3.
             IF sy-subrc <> 0.
               RAISE spool_write_failed.
             ENDIF.
           ENDIF.
         ENDLOOP.
       ENDLOOP.
       CALL FUNCTION 'RSPO_SR_CLOSE'
         EXPORTING
           handle = lv_spool_handle.
       IF sy-subrc <> 0.
         RAISE spool_close_failed.
       ENDIF.

  • Error occurred during character conversion in SXMB_MONI

    Hello Experts,
    Good Day!
    I would like to seek your help here. When i used tcode SXMB_MONI to search for messages i get this error : Error occurred during character conversion.
    So far no problem with the program. Its work for all other dates. Just for one particular day and specific time period, im geting this error.
    Does anyone know what is this error means? Please reply..
    Thanks for your help.
    Looking forward for ur replies..

    Hi Presheela,
    Basically this problem occurs when ur payload contains any special characters like '&' ,'>'...etc .So you have to take care of how to deal with these characters in XI.
    Refer the following documentation:
    How to Work with Character Encodings in Process Integration
    https://www.sdn.sap.com/irj/sdn/go/portal/prtroot/docs/library/uuid/502991a2-45d9-2910-d99f-8aba5d79fb42
    Regards,
    Vinod.

  • Character conversion error

    Hi all,
    We are getting the following error when trying to parse an xml string resource - Character conversion error: "Illegal ASCII character, 0xc2" (line number may
    be too low)-. We have not been able to get around this. We have tried creating the InputSource two different ways:
    reader = new StringReader(stringSource);
    src = new InputSource( reader );
    and
    src = new InputSource(new InputStreamReader(new ByteArrayInputStream(stringSource.getBytes())));
    The problem does appear to go away if we treat the DTD we are validating against as a file. If we set it has a uri, we get the above problem.
    Is anyone else experiencing this problem?
    Any help would be greatly appreciated.
    Thanks in advance,
    Greg

    Hi,
    2 possible solutions:
    1) try using the xerces parser instead of sun's parser
    2) look at the posting at the following url and see wether the posted solution solves your problem: http://forums.java.sun.com/thread.jsp?forum=34&thread=67558
    Hope this helps,
    Kurt.

  • "character conversion error" while parsing xml files

    Hello,
    I'm trying to parse MusicXML (Recordare) files, but I'm getting an exception.
    I'm using the SAX parser (javax.xml.parsers.SAXParser).
    Here is the code I use to instantiate it:
    final javax.xml.parsers.SAXParserFactory saxParserFactory = javax.xml.parsers.SAXParserFactory.newInstance();
    final javax.xml.parsers.SAXParser saxParser = saxParserFactory.newSAXParser();
    final org.xml.sax.XMLReader parser = saxParser.getXMLReader();
    I'm using my own handler, but I get the same exception even if I use org.xml.sax.helpers.DefaultHandler.
    The error I get is:
    Character conversion error: "Illegal ASCII character, 0xc2" (line number may be too low).
    The first few lines of my xml files look like this:
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <!DOCTYPE score-partwise
    PUBLIC "-//Recordare//DTD MusicXML 0.6 Partwise//EN"
    "http://www.musicxml.org/dtds/partwise.dtd">
    <score-partwise>
    [...etc...]
    If I delete the <!DOCTYPE ...> line, then I don't get the exception anymore. But the MusicXML files I get (from some other program) always contain this line, and it would be quite some work to delete them from every file manually.
    So does anyone know if there is a way to avoid deleting that line in every file, while still being able to parse the xml files without exceptions?
    Or maybe does anyone know what the exact cause of the exception is? (because I don't know what exactly causes it)
    Thank you in advance.
    Greetz,
    Jipo

    So does anyone know if there is a way to avoid
    deleting that line in every file, while still being
    able to parse the xml files without exceptions?ok this is side-stepping the real problem but I've used this code to filterout DTD references for other reasons   public static InputStream filterOutDTDRef(InputStream in) throws IOException {
          BufferedReader iniReader = new BufferedReader(new InputStreamReader(in));
          StringBuffer newXML = new StringBuffer();
          for(String line = iniReader.readLine(); line!=null; line = iniReader.readLine())
             newXML.append(line+"\n");
          in.close();
          int s = newXML.indexOf("<!DOCTYPE ");
          if(s!=-1)
             newXML.replace(s,newXML.indexOf(">",s)+1,"");
          return new ByteArrayInputStream(newXML.toString().getBytes());
       }and it actually speeds up the parsing phase too (since the DTD ref.s were on the web and the XML standard mandates that there is a fetch for each xml file parsed..)
    you can feed the above into the InputSource constructor that takes an InputStream argument.
    Now for the real problem... 0xc2 is "LATIN CAPITAL LETTER A WITH CIRCUMFLEX" according to a unicode chart - which is not an ASCII character (as the error message correctly reports). I'm not sure why the file is being parsed as ASCII though? You could try parsing in a FileReader to the inputsource and hope it picks up the default character encoding of your system, and that that character encoding matches the file. Or you could try passing in a FileReader constructed with a explicit character encoding (eg "UTF8") and see if that does the trick?
    asjf

  • Character conversion error: Unconvertible UTF-8 character beginning..

    Hello,
    I'm using TrAX for XSLT transformations, and having a following
    problem
    Character conversion error: "Unconvertible UTF-8 character beginning with 0xa9" (line number may be too low).
    org.xml.sax.SAXParseException: Character conversion error: "Unconvertible UTF-8 character beginning with 0xa9" (line number may be too low).
            at org.apache.crimson.parser.InputEntity.fatal(InputEntity.java:1100)
            at org.apache.crimson.parser.InputEntity.fillbuf(InputEntity.java:1072)
            at org.apache.crimson.parser.InputEntity.isXmlDeclOrTextDeclPrefix(InputEntity.java:914)
            at org.apache.crimson.parser.Parser2.maybeTextDecl(Parser2.java:2795)
            at org.apache.crimson.parser.Parser2.externalParameterEntity(Parser2.java:2880)
            at org.apache.crimson.parser.Parser2.maybeDoctypeDecl(Parser2.java:1167)
            at org.apache.crimson.parser.Parser2.parseInternal(Parser2.java:489)
            at org.apache.crimson.parser.Parser2.parse(Parser2.java:305)
            at org.apache.crimson.parser.XMLReaderImpl.parse(XMLReaderImpl.java:442)
            at mlts.converter.XMLImport.outputESGML(XMLImport.java:311)
            at mlts.converter.Converter.processFile(Converter.java:312)
            at mlts.converter.Converter.Convert(Converter.java:229)
            at test.main(test.java:7)Following the source code, I've found that the exception is thrown
    when it reads DTD. I tried to read DTD using InputSource
    in ascii, in latin-1 and my program reads it without any problem.
    I really appreciate any help,
    Thanks

    http://forum.java.sun.com/thread.jsp?forum=34&thread=254927

  • Oracle 8i us7ascii character set problem - help required urgent.

    Hi frnds,
    I have a oracle 8i database server installed on sun solaris os. The database character set is us7ascii. In one of the tables TIFF images are stored in a long column. I m trying to fetch these images using oracle 9i client and visual basic(oracle ODBC drivers). But i m unable to do so. I can not fetch special characters.
    Is it because of the character set problem? but when i run my code on the server itself, i m able to fetch the images. I tried to fetch the images using oracle 8 i client on windows XP machine but could not do so. Are there any special settings that i have to do on the client side?

    Indeed, it's an ODBC issue. Read this statement from Oracle:
    From ODBC 8.1.7.2.0 drivers onwards it's NOT possible any more to
    "disable" Characterset conversion by specifying for the NLS_LANG
    the same characterset as the database characterset. There is now
    ALWAYS a check to see if a codepoint is valid for that characterset.
    Typically you will encounter problems if you upgrade an environment
    that has NO NLS_LANG set on the client (or US7ASCII) and the database
    was also US7ASCII. This incorrect setup allowed you to store characters
    like èçàé in an US7ASCII database, with the new 8i drivers this is not possible
    any more.
    Basic problem is the 'wrong' characterset US7ASCII in the database. As long as no characterset conversion happens (that's the case on the unix server), special characters are no problem.
    Werner

  • SAXParseException: character conversion error: Illegal character 0x9A...

    This is my problem:
    I use JDom to parse remote XML document with DTD linked to it. But I get that error. Request is:
    SAXBuilder builder = new SAXBuilder();
    Document doc = builder.build(new URL(url));
    This works fine when I use these XML and DTD docs locally, which means that I give xml file name as a parameter from console. Then everything goes well. But when I move my program to server and try to run it there, SAXParseException is thrown. Why???
    Error is:
    error on line 1 of document "http://server.net/doc.dtd" Character conversion error: Illegal ASCII character 0x9A (line number may be too low)
    What this means? And why this just happens when I run program in server? Help, please
    tia J_J

    Exactly that's the problem.
    String class :
    * This class is implemented to map an ordinary java.lang.String
    * into an xml compliant String
    public class String2Xml
         private final String invalidChars [] = {"�",
                                                           "�",
                                                           "�",
                                                           "�",
                                                           "�",
                                                           "�",
                                                           "�",
                                                           "�",
                                                           "�",
                                                           "�",
                                                           "`",
                                                           "�",
                                                           "<",
                                                           ">",
         private final String replaceChars [] = {"��",
                                                           "&#223;",
                                                           "&#228;",
                                                           "&#246;",
                                                           "&#252;",
                                                           "&#196;",
                                                           "&#214;",
                                                           "&#220;",
                                                           "&#167;",
                                                           "&#128;",
                                                           "&#96;",
                                                           "&#180;",
                                                           "<",
                                                           ">",
                                                           "&apos;"};
         * Constructor
         public String2Xml();
         * This operation is implemented to check if the given String
         * matches one of the invalidChars. If an invalid char is found
         * it'll be replaced.
         * @return String - the correct xml String
         public String checkString(String check)
              for (int i = 0; i < invalidChars.length; i++)
                   check = check.replaceAll(invalidChars, replaceChars[i]);
              System.out.println("Check : " + check);
              return check;

  • UDF for Special Character Conversion

    Hello All,
    Can any one help me with the UDF for special character conversion code.
    I mean if a special character is given it should pass a blankspace.
    << Moderator message - Everyone's problem is important >>
    Many thanks,
    Rahul.
    Edited by: Rob Burbank on Oct 29, 2010 4:32 PM

    Hi Rahul ,
    the best way to deal with special character is to use proper encoding in your Sender Communication channel it self .So that in your payload you will get proper value .
    IF you are getting some special character which is not covered in encoding UTF-8 thne you can use encoding IS0-8859-1 .You can easily refer in help that how to use encoding in your communication channel .
    Regards,
    Saurabh

  • XMLELEMENT and suppress character conversion for brackets etc...

    Hi,
    I want use XMLElement, XMLAgg to create a large XML-file from database.
    However I have the problem that I already SELECT valid XML chunks from
    the database. I this case I don't want a character conversion,e.g.
    open angle bracket (<) becomes &lt;
    quotation mark (") becomes &quot;
    and so on.
    Is it possible with my Oracle 9.2 ?

    Hi,
    ok finally I think I found now a solution for my problem which is:
    printing a large XML file in one transaction within sqlplus.
    I let a function create the XML and write it to a CLOB variable.
    In sqlplus I do simply
    -- 1 GB is acceptable
    SET LONG 1073741824;
    SPOOL myspoolfile
    SELECT MYFUNTION FROM DUAL;
    I still have to proof it with mass data.

  • SAXParser character conversion

    I have a problem … a very peculiar one could you please guide.
    Original message: Kevätsunnuntaisin lentää
    The following is the encoding of the first 7 characters
    4b 65 76 c3 a4 74 73 75 – in UTF-8 this is the encoding desired
    4b 65 76 a3 74 73 75 – in the Parser - faulty encoding
    InputSource inputSource = new InputSource(myInputStream);
    inputSource.setEncoding("UTF-8");
    parser.parse(inputSource);
    The original string gets converted to Kev£tsunnuntaisin lent££.Also, there is a loss of a byte.
    Could you please guide me where I am going wrong? What must I do to avoid this character conversion?
    Thanks for your help!!!

    Yes, that's wrong. The bytes "c3 a4" should be decoded via UTF-8 to the character "e4", not "a3".
    It seems to me that if you pass an InputStream to the parser, you don't need to tell the parser what encoding to use. It should normally be able to figure that out by looking at the prolog of the document being input. But if it's really encoded in UTF-8, it shouldn't do any harm to tell the parser that.
    You are passing an InputStream to the parser, are you not? Your variable name suggests that but there's no code to prove it. You could try passing it directly to the parser and not wrapping it in an InputSource.

Maybe you are looking for