Converting Filename Encoding

Hello, I am trying to extract a .zip archive that contains CJK characters in its filenames. It was most likely created on a Windows machine.
I tried the unzip utility and it produced invalid symbols. The same with 7za, but with slightly different ones. My LANG variable was set to en_US.UTF-8, but setting it to ja_JP.ujis seems to have no effect. I'm assuming that this means that the CJK filenames were encoded into the archive in a format other than UTF-8, and I need to convert it to UTF-8 for them to display properly.
I know of convmv, and I used a shell script to test every possible encoding from `convmv --list` to no avail. I have the Unicode equivalents of the filenames for the most part, but in a format that's cumbersome to manually rename all of them, but with them I can verify if the conversion was successful or not.
Observing the hex dump of ls and with positional deduction I concluded that U+4EBA (人) is represented as 0xC9 0x6C with the unzip output and 0xC2 0x90 0x6C with the 7za output. This also means it isn't impossible that I'm not dealing with the original encoding in the first place.
So, why would two zip extractors produce different results, and are there any other leads to help me convert these filenames to UTF-8 correctly?
Last edited by Cinolt (2011-04-14 00:26:41)

Well. It looks like running WinRAR with ja_JP.utf8 seems to work.
If wine can emulate it to work, it must work somehow natively in Linux. Even though I can get it to extract correctly I'd still appreciate if somebody could teach me the "proper" way to do it.

Similar Messages

  • XML: LPX-00200: could not convert from encoding UTF-8 to UCS2

    Hi,
    Greetings!
    I have special character(s) in a column and that character is chr(189) and because of that when i use the xml functions in my query it returns below error.
    ORA-31011: XML parsing failed
    ORA-19202: Error occurred in XML processing
    LPX-00200: could not convert from encoding UTF-8 to UCS2
    Error at line 1
    ORA-06512: at "SYS.XMLTYPE", line 0
    ORA-06512: at line 1
    I am using sys_xmlagg and getting above error when i encounter the data as below:
    "Dixon¿s Chicago".
    Note: When ever It encounters the bold character string it fails ... Any help !!!
    And one more thing when i create another record with same data by copy and pasting it works fine and when i did a dump on that column data its different. see the below result of dump.
    Naveen.
    SQL> desc temp_xml;
    Name Null? Type
    TNO NUMBER(4)
    NAME VARCHAR2(255)
    SQL> select name,length(name),dump(replace(name,chr(189),'')) data_dmp from temp_xml;
    NAME LENGTH(NAME) DATA_DMP
    ¿s Chicago 10 Typ=1 Len=12: 239,191,189,115,32,67,104,105,99,97,103,111
    ¿s Chicago 10 Typ=1 Len=11: 194,191,115,32,67,104,105,99,97,103,111
    SQL>
    if you observe the above 2 rows the fist row shows length as 12 and second shows as 11. actually 2nd rows works fine but first gives error. I am not able to see where that hidden character is and not able to remove that character.
    Message was edited by:
    naveenhks

    Hi,
    I have a similar problem:
    ORA-31011: XML parsing failed
    ORA-19202: Error occurred in XML processing
    LPX-00200: could not convert from encoding UTF-8 to ISO-8859-1I'm executing the following Select when encountering this error:
    SELECT /*+ INDEX(resource_view XDBHI_IDX) */
               extract(resource_view.res, '/Resource/Contents/*').getClobVal()           AS Dokument
      FROM  resource_view
    WHERE resource_view.any_path LIKE '%PATH_TO_FILE%';I have 5 XML-documents and this error occurs at two ('A' and 'B') of them. When I transfer the same 5 documents from another PC the error occurs at document 'C' and not at 'A' and 'B'.
    Any clue or hint which could explain this behaviour? What NLS parameters can I check in order to help you understand the situation?
    Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production
    PL/SQL Release 11.2.0.1.0 - Production
    CORE 11.2.0.1.0 Production
    TNS for IBM/AIX RISC System/6000: Version 11.2.0.1.0 - Production
    NLSRTL Version 11.2.0.1.0 - Production

  • LPX-00200: could not convert from encoding UTF-8 to UCS2 during

    When doing an query with SQL/XML on a database with the following characterics
    NLS_LANGUAGE -- AMERICAN
    NLS_CHARACTERSET - UTF8
    NLS_NCHAR_CHARACTERSET - AL16UTF16
    Database version is 9.2.0.5.0
    I get the following error on a text with special characters
    ORA-31011: XML parsing failed
    ORA-19202: Error occurred in XML processing
    LPX-00200: could not convert from encoding UTF-8 to UCS2
    I used the tip from another developer to use the convert function :
    XMLForest(convert(train.programma, 'UTF8', 'WE8ISO8859P1') AS "programma"
    Now at least I don't get the error but in the resulting all special characters (and there are a lot of them!) are garbled
    So for example (hope this comes across during the post)
    original : coördinatoren
    in xml : coördinatoren
    Any suggestions here? I'm not really allowed to change any database-parameters or doing an upgrade
    Maybe using something else in the convert function?

    Here's the complete query (with the convertfunction in it). Hope this helps
    select XMLElement("lmsImport", XMLAttributes('LmsImportSchema.xsd' as "xsi:noNamespaceSchemaLocation",
    'http://www.w3.org/2001/XMLSchema-instance' as "xmlns:xsi"),
    XMLElement("trainingen",(
    Select XMLAGG(XMLElement("training",
    XMLElement("id", train.id),
    XMLElement("code", train.code),
    XMLElement("leverancier",
    XMLElement("leverancier",
    XMLATTRIBUTES(NVL(train.leverancier, 'vendr000000000001021') as "leverancierId"))),
    XMLElement("status", (CASE when TRUNC(NVL(train.disc_from, sysdate)) >= TRUNC(sysdate) then
    'Actief'
    else
    'Inactief'
    END )),
    XMLElement("naam", train.naam),
    XMLForest(
    XMLForest(Decode(train.afronding, 'Bewijs van Deelname', 'Instituutsdiploma', train.afronding) AS "afronding") as "afrondingen"),
    XMLForest(
    XMLForest(train.lesmethode AS "lesmethode") as "lesmethodes"),
    XMLForest(
    XMLForest(convert(train.programma, 'UTF8', 'WE8ISO8859P1') AS "programma") as "memos"),
    XMLForest(train.lesduur AS "lesduur")).extract('/*'))
    FROM (select cours.id id, cours.course_no code, cours.custom2 leverancier, cours.disc_from disc_from, cours.title naam, 'Training' lesmethode,
    cours.desc1||cours.desc2||cours.desc3||cours.desc4 programma, ROUND(cours.num_days) lesduur, ddcus.str_value afronding
    from tpt_courses cours
    inner join fgt_domain domin
    on cours.split = domin.id
    left join (fgt_dd_custom ddcus
    inner join fgt_dd_domain_to_attr ddoat
    on ddoat.id = ddcus.attr_id
    on ddcus.owner_id = cours.id
    where (domin.name = 'Content' or domin.name like 'CT%')
    and ddoat.attr_id = 'ddatr900000000000009'
    union
    select prdct.id id, prdct.part_no code, prdct.vendor_id, prdct.disc_from disc_from, prdct.name naam,
    decode(prdct.equip_cat_id, 'eqcat000000000000005', 'WBT', 'eqcat000000000001006', 'Book', 'eqcat000000000001019', 'Training', 'eqcat000000000001037', 'Training') lesmethode,
    prdct.desc1||prdct.desc2||prdct.desc3||prdct.desc4, null, custom7
    from tpt_product_catalog prdct, fgt_domain domin
    where prdct.split = domin.id
    and (domin.name = 'Content' or domin.name like 'CT%')) train )).extract('/*')).extract('/*') from dual;

  • How do you convert the encoder counts to RMP of the motor using a myRIO in LabVIEW?

    My intent is to gain hardware interfacing and general Labview experience. I am using a myRIO to control a Pittman 8222 34V Brushed DC motor (Motor Specs).
    I recently created a code to control the angle of this motor using PID, which was visually implemented using the black disc with a white tick mark on it (see attached picture: "IMG_2523"). 
    I now want to control the speed of this motor againg using PID, but I am not sure how to properly convert the encoder counts to RPM of this motor. I have attached 2 screen shots of my block diagram and my front panel for reference ("Capture" and "Capture2"). In the front panel picture, you can see my "actual" speed of the motor is a very choppy signal when it should theoretically be a flat line. To get the RPM, I currently use a shift register to store the current count value (which I converted to deg, then to radians), and then subtract the previous run's value, then divide by the while loop sampling time (10 ms). This would give me rad/s, which I then wrote a subVI to convert it to rpm. 
    Any help would be greatly appreciated, thanks. 

    Hi,
    One thing you could try doing is verifying that the conversion from rad/s to rpm is working correctly. 
    This tutorial might also be helpful. It's not using the same hardware that you are, but goes through the general steps using PID control. 
    CompactRIO Motor Control Basics Tutorial: http://www.ni.com/pdf/labview/us/compactrio_motor_control_basics.pdf
     

  • Converting Base64 encoded String to String object

    Hi,
    Description:
    I have a Base64 encoded string and I am using this API for it,
    [ http://ws.apache.org/axis/java/apiDocs/org/apache/axis/encoding/Base64.html]
    I am simply trying to convert it to a String object.
    Problem:
    When I try and write the String, ( which is xml ) as a XMLType to a oracle 9i database I am getting a "Cannot map Unicode to Oracle character." The root problem is because of the conversion of the base64 encoded string to a String object. I noticed that some weird square characters show up at the start of the string after i convert it. It seems like I am not properly converting to a String object. If i change the value of the variable on the fly and delete these little characters at the start, I don't get the uni code error. Just looking for a second thought on this.
    Code: Converting Base64 Encoded String to String object
    public String decodeToString( String base64String )
        byte[] decodedByteArray = Base64.decode( base64String );
        String decodedString = new String( decodedByteArray, "UTF-8");
    }Any suggestions?

    To answer bigdaddy's question and clairfy a bit more:
    Constraints:
    1. Using a integrated 3rd party software that expects a Base64 encoded String and sends back a encoded base64 String.
    2. Using JSF
    3. Oracle 10g database and storing in a XMLType column.
    Steps in process.
    1. I submit my base64 encoded String to this 3rd party software.
    2. The tool takes the encoded string and renders a output that works correctly. The XML can be modified dynamically using this tool.
    3. I have a button that is binded to my jsf backing bean. When that button is clicked, the 3rd party tool sets a backing bean string value with the Base64 String representing the updated XML.
    4. On the backend in my jsf backing bean, i attempt to decode it to string value to store in the oracle database as a XML type. Upon converting the byte[] array to a String, i get this conversion issue.
    Possibly what is happen is that the tool is sending me a different encoding that is not UTF-8. I thought maybe there was a better way of doing the decoding that i wasn't looking at. I will proceed down that path and look at the possibility that the tool is sending back a different encoding then what it should. I was just looking for input on doing the byte[] decoding.
    Thanks for the input though.
    Edited by: haju on Apr 9, 2009 8:41 AM

  • Filename encoding on content server

    Hi experts,
    In DMS we are using file system repository from the SAP Content Content installed on HP-UX. When we save files with the russian filename all symbols in the saved file are encoded with % + hexadecimal representation. So we can save files only with the short names. When we try to save file with the long filename its encoded name exceed 256 symbols limit and system give us an error.
    I appreciate any suggestion or any workaround.
    Best Regards
    Pavel

    The official answer from SAP:
    Content Server operates on Non unicode mode which means it cannot store
    the files or documents with special characters(russian here ),Hence KPROLayer before it sends the document for Storage to the Content Server it
    converts the russian file name to binary format.
    During the Conversion each special(russian) is mapped to particular
    format, hence after conversion the file name would be large,Since it is
    a File System Repository,the operating system wont allow to create the
    files larger than 255 characters.
    Hence this is a limitation from Operating system, the files exceed 255
    characters, it is not supported.
    Even if the Content Server is based on MaxDB, the length of the
    converted result is same and MaxDB also has the maximum filename
    restriction of 254 characters.
    If the ECC system is unicode,the length of the converted result will be
    short than now (this is the behaviour in our internal system which is
    unicode).This should resolve the issue, but we are not completely sure
    of this.
    However if you use R3 System Database, there will be no issue as
    conversion does not happen there.
    But with MaxDB it works.

  • Filename encoding from content server

    Dear gurus,
    I need your help. As storage our documents for ERP we use content server. I observe problem for example I try to open in sbwp in shared folders file with name "Роли-транзакции MM.xls". On first time it  opens without any problem. Then I try open this file or other I see message "file is not found <name of file>". When i go in sapworkdir (or sapgui) folder I see file with filename like  this "####-##########MM_20130919063339.006_X.xls". But the files with English letters in the file name opens correct. I understand that the problem is somewhere in the content server, but where?

    Hi Dmitry,
    Do you do all the activities with logon language RU and a Windows front end system where the system code page is based on Cyrillic? If not, please retry with a "completely localized" setup.
    The fact that the Russian characters are converted to #-symbols indicate that somewhere there is a conversion from Unicode --> Non-Unicode (probably Latin-1 - maybe based on logon language) and this fails.
    Unfortunately I am not familiar with content server - and these kind of problems are difficult to be analyzed without direct debugging on the system. Therefore I think it is the best if you raise a customer message  at SAP.
    Or maybe there is someone who knows content server very well and might be able to help ...
    Best regards,
    Nils

  • Get canvas.toDataURL('image/jpeg') and convert base64 encoding to java.sql.Blob

    Convert canvas.toDataURL('image/jpeg') to java.sql.Blob.
    I am using oracle adf so I am able to action a backing bean from javascript and pass in parameters as a map. I pass in the canvas.toDataURL('image/jpeg') which I then try to decode in my bean. Using BASE64Decoder and the converting the bytearray to a file I can see the image is corrupted as I can't open the file thus converting the bytearray to blob is also a waste.
    Has anyone any ideas on base64 encoding from canvas.toDataURL to file or Blob?

    Use Case:
    A jsf page that enables a user to take photos using the HTML5 canvas feature - interact with webcam -, take photos and upload to profile
    1. I have created the jsf page with the javascript below; this pops up as a dialog and works okay and onclick an upload image, triggers the snapImage javascript function below and sends the imgURL parameter to the serverside managedbean
    <!-- java script-->
    function snapImage(event){
                    var canvas = AdfPage.PAGE.findComponent('canvas');
                    AdfCustomEvent.queue(event.getSource(),"getCamImage",{imgURL:canvas.toDataURL('image/jpeg'),true);
                    event.cancel();
    <!-- bean -->
    public void getCamImage(ClientEvent ce){
    String url=(String)ce.getAttributes().get("imgURL");
    decodeBase64URLToBlob(url);
    private BlobDomain decodeBaseB4URLToBlob(String url64){
                    BASE64Decoder de=new BASE64Decoder();
                    byte[] bytes=de.decode(url64);
                    File file=new File("abc.jpg");
                    InputStream in = new ByteArrayInputStream(bytes);
                    BufferedImage bImageFromConvert = ImageIO.read(in);
                    in.close();
                    ImageIO.write(bImageFromConvert, "jpg", file);
                    return createBlobDomainFromFile(file);
    ----problem---
    Accessing the generated jpeg file shows the image is corrupted, probably missing bytes or encode/decoder issues.and the blob image after uploading to database is saved as a binary stream which ondownload doesnt render as an image or anything i know of.
    Is there anyways of achieving the conversion without errors?

  • LPX-00200: could not convert from encoding UTF-8 to UCS2

    Dear Gurus,
    I am facing this error while I am parsing an XML File.
    My Database settings are
    NLS_LANGUAGE -- AMERICAN
    NLS_CHARACTERSET - AL32UTF8
    NLS_NCHAR_CHARACTERSET - AL16UTF16
    Database version is 10gr2. (10.2.0.3.0 )
    My XML File is having version encoding given below
    <?xml version="1.0" encoding="ISO-8859-1" ?>
    If I convert encoding to UTF-8 Then the XML Parsing is doing fine, but the special characters are like Ö ß converted to some other Characters like öß, which I don't want.
    Appreciate any help.
    Regards
    Madhu K

    I have been across the this problem. I solve the problem that come to original ml file. '-' character cause this problem:
    <?xml version="1.0" ?>
    - <ROWSET>
    - <ROW>
    <DEPTNO>10</DEPTNO>
    <DNAME>ACCOUNTING</DNAME>
    <LOC>NEW YORK</LOC>
    </ROW>
    - <ROW>

  • DNG Converter filename as original date from EXIF

    Hi there,
    I like the DNG Converter, except that I would like to use the date from the EXIF information as the filename. That doesn't seem to be possible.
    Does anybody know of a(nother) way of doing this?
    Thanks!
    Regards,
    D.M.

    Under section 3 in the DNG Converter, select one of the date options from one of the popup menus.
    If you don't like that format, consider using Bridge or Lightroom to perform the renaming.

  • Help me convert filename .jpg.xml to a format i can manipulate

    i have to access some photos from a file with an extension .jpg.xml. how do i convert this to something i can use. im so lost with tech stuff

    Welcome to Apple Discussions
    What happens when you remove the .xml extension? Provided that it's a valid JPEG file in the first place, removing the .xml extension should work. Make sure to remove the period as well, so it's just
    <filename>.jpg
    What happens if you remove the .xml, and then change the .jpg part to something else, like .png? In mac, just changing the file extension usually changes the file type as well.
    Try opening the <filename>.jpg in Preview. Or cover flow.
    Chances are, it will probably open in Adobe CS. Try that and see if the image appears. If it does, then use the Save As command.

  • Dilemma converting arbitrary encoding to UTF-8

    Here's my dilemma: I recently modified our webapp to use UTF-8 encoding across the board, since data with special characters that users added to the content management backend was being displayed incorrectly in ISO-8859-1. It works great for Strings we get from the database, since it uses UTF-8. The problem now is that there are also files that consist of html chunks that get added to pages when they're rendered by the jsps. Those files aren't always UTF-8 encoded, so characters are displaying incorrectly in those parts of the page.
    The problem is that we don't know what encoding the html chunks are, some are ISO-8859-1, some are Windows-1252, etc. There are hundreds of them, and the users use all kinds of programs to generate the files, Frontpage, Dreamweaver, etc. so there's no common encoding used. I'm trying to modify the code that reads those files so it converts the text to UTF-8 for display, but without knowing what encoding the file is in, how can you do the conversion properly? Here's the code I have currently:
            ByteArrayInputStream contentInput = file.getContent();
            // wrap byte stream in UTF-8 character stream
            BufferedReader br = new BufferedReader(new InputStreamReader(contentInput, "UTF-8"));
            StringBuffer outputBuffer = new StringBuffer("");
            do {
                readString = br.readLine();
                outputBuffer.append(readString);
            while (readString != null);We get a ByteArrayInputStream from the third party API, which I wrap in a UTF-8 encoded BufferedReader. The problem is that, for instance, this character '�', when encoded in the file as ISO-8859-1, get's garbled when converted to UTF-8.
    My question is: Is there a way to convert text to UTF-8 without knowing the encoding of the file? I suspect the answer is no, but I'm really hoping it's yes, since the alternative is re-encoding hundreds to thousands of files in the db, then retraining hundreds of users to always save files as UTF-8. (You can't see my brain spasming at the thought of that, but trust me, it is ;P).

    As an update, in case anyone else runs into this same problem:
    I used the SmartEncodingInputStream from uncle_alice's link, and it works just well enough to solve my problem. The only encoding that it guessed correctly was UTF-8. But it guessed windows-1252 for US-ASCII, windows-1252, and ISO-8859-1. Since 1252 is a superset of ascii and 8859, using 1252 decodes all the characters correctly from those encodings. All the content I tested with was decoded correctly, presumably because it all uses one of those four encodings. The one snag I hit was that the SmartEncodingInputStream doesn't reset the InputStream after it reads it, so I have to do it manually after getting the guessed encoding. Here's the code I used:
            // Get the file content
            ByteArrayInputStream contentInput = file.getContent();
            StringBuffer outputBuffer = new StringBuffer("");
            // wrapper around the input stream that guesses the encoding of the stream
            SmartEncodingInputStream smartIS = null;   
            // use a 8k buffer, and a default encoding of windows-1252
            smartIS = new SmartEncodingInputStream(contentInput, SmartEncodingInputStream.BUFFER_LENGTH_8KB,
                    Charset.forName("windows-1252"));
            String charsetName = smartIS.getEncoding().name();      // get the name of the encoding guessed
            contentInput.reset();       // reset the position to the beginning of the stream
            byte[] contentBuffer = new byte[8192];
            int bytesRead = 0;
            while( (bytesRead = contentInput.read(contentBuffer, 0, 8192)) > 0 ) {
                // encode the output with the encoding guessed by the SmartEncodingInputStream
                outputBuffer.append(new String(contentBuffer, 0, bytesRead, charsetName));
            contentInput.close();I left out the try/catch blocks for readability. I get the ByteArrayInputStream from a library call, and end up with the file contents encoded in UTF-8 in outputBuffer.

  • Is there a way to add text to converted filename?

    First I wanna say, I just LOOOVE the New Media Encoder CS5.5!
    I want to add a text to a bunch of files ip converting such as:
    [(sourcename)] + ["ProRes422 25fps"]
    So say the original file names are:
    MVI_1459.mov
    MVI_1460.mov
    MVI_1461.mov
    so after export it'll be
    MVI_1459 ProRes422 25fps.mov
    MVI_1460 ProRes422 25fps.mov
    MVI_1461 ProRes422 25fps.mov
    Is there a way to do that? Like in Apple compressor?
    If there is then this program is flawless!!!
    Please tell me how?!

    No, there isn't an option to append something like the name of the encoding preset. You'd have to manually rename each one in the AME queue. Make a feature request here: Adobe Feature Request/Bug Report Form
    You could use Bridge to Batch Rename the files, however, or any other application that lets you modify the names of multiple files in one go. Check out this help document page: Batch rename files

  • How to convert filenames into strings

    im listing a lists of files
    using the method listFiles() which returns me data of type File which are actually
    all files in the dir.
    may i know how do i convert the filenames returned to become strings
    so that i can manipulate the filenames as string format instead of file format?
    thanks for any help suggested!

    Im encountering a problem now before i can test the method...
    This the current problem i had :
    i only showed part of the codes here:
    File[] listing = Dir.listFiles();
    String[] lists;
    for (int i=0; i < listing.length; i++)
    lists[i] = new String(listing.getName());
    out.print(lists[i] + "<br>");
    The error i had:
    variable lists might not have been initialized
    lists[i] = new String(listing[i].getName());
    ^
    1 error
    I forgot how to deal with String array...
    I declare as String[] lists; coz i do not know the size of the array...
    how do i assign the filenames to strings?
    Please kindly help..thanks a lot!

  • Stop Videos from Up-Converting during Encode

    Hello. I have a series of movies and audio tracks in MP4 and AC3, respectively. The AC3 files encode at amount the same size they are at, but the MP4 files become much larger (a 2.48MB file encodes to 44.7MB.) The videos are at 880x660 resolution. What should I convert to in order to stop this? Thanks! (DVD Studio Pro 4).

    OK, but I extracted the AC-3 from the MP4 files. Can i get rid of it completely in the files? The audio in the MP4 files encodes into huge sized... I want to strip it out, as the file size overall will be smaller.
    I'm not sure I understand what you're asking. You do want both audio and video on the DVD version, don't you?
    To be on the safe side, I'd go ahead and let Compressor also make new AC3 audio files at the same time you do the MPEG-2 encoding.
    -DH
    Message was edited by: David Harbsmeier

Maybe you are looking for

  • Apple TV 2 Stops Playing after 2 Songs

    My apple TV 2 plays 2 songs and stops since I downloaded the iTunes and Apple TV updates. Is Apple jamming our home libraries so that we pay $24.99 per year for iTunes Match? I'm ready to drop Apple products.

  • Apple TV 3: TV Shows icon missing

    Hello, as I was looking at the home screen of my Apple TV 3 (with latest software, 5.1.1 (5433)) the TV Shows icon disappeared.  This happened just a few moments ago. I now only have Movies, Music, Computers and Settings on the top row of icons. I li

  • Editing documents ( URL / web linked documents )

    Hi, I'm trying to edit documents (uploaded files and urls) through the portal using the EDK. For uploaded files I have created a Portlet that uploads the replacement file and replaces the exisitng one in the filesystem on the server where portal is r

  • Stuck in landscape then portrait mode...

    After attempting to watch a video, the iPad stayed stuck in landscape mode. None of the following helped: turning on & off, attempting to sync, shaking it around, using different apps. I finally did a Reset All Settings and it put the device back in

  • General jsf question

    when is it necessary to do component binding?