Php / mysql  replace non-ascii character in a string

I have a script that converts a ms word document to text then
uploads that to a blob field on a mysql db.
During the conversion some characters my not be recognised.
When i then call up the blob for display on the browser...those
characters show up as unknown characters with a ? or box. Is there
a way to preg_replace those unknown characters before displaying
them.
thanks
ian

.oO(surfinIan)
>I have a script that converts a ms word document to text
then uploads that to a
>blob field on a mysql db.
> During the conversion some characters my not be
recognised. When i then call
>up the blob for display on the browser...those characters
show up as unknown
>characters with a ? or box. Is there a way to
preg_replace those unknown
>characters before displaying them.
What about fixing the encoding problem instead? If chars get
lost during
such a transfer
document->script->database->script->browser it's always
an encoding problem somewhere down the road.
The recommendation these days is to use UTF-8, which avoids
most of
these old problems. You just have to make sure that your
documents are
properly stored as UTF-8 in the database and delivered as
such to the
script and the browser, then you don't have to worry about
special chars
anymore.
That's just the general idea. I can't be more specific, since
I don't
know your conversion script or the database structure.
Micha

Similar Messages

  • Replacing non-ascii characters in String

    I have a site where the user enters data in a rich text
    editor (ktml4) that gets stored into a database (mysql). There are
    non ascii characters getting into the data, I'm assuming that they
    are copying and pasting from Word. Unfortunately in this situation,
    changing that process isn't an option.
    Currently, this is the only character that is causing me
    problems:
    http://www.zvon.org/other/charSearch/PHP/search.php?request=ffa0&searchType=3
    I would just like to replace the non-ascii characters with a
    space when I read them from the database. Something like:
    #Replace(result.column, '\xffa0', ' ')#
    However, I believe that code looks for the string "\xffa0",
    not the character \xffa0.
    Is there anyway to do this?

    quote:
    Originally posted by:
    BuckLemke
    quote:
    Originally posted by:
    Dan Bracuk
    rereplace might work.
    Can you give an example of how to pass a non-ascii character
    to REReplace?
    Regular expressions are not my strength, but the approach I
    was considering was, "if it's not an ascii character, make it a
    space". Then you pass the entire string at once.

  • Non-ASCII character in Email field

    Hi Guys,
    I am trying to enter non-english characters in Email field of user form, but OIM throws an error that "A non-Ascii character has been entered". I have also tried to turn off the AppFirewall Filter in xlConfig.xml file but no help. Is there any way thay I can enter non-Ascii characters in Email field?
    Regards,
    Rahul

    .oO(surfinIan)
    >I have a script that converts a ms word document to text
    then uploads that to a
    >blob field on a mysql db.
    > During the conversion some characters my not be
    recognised. When i then call
    >up the blob for display on the browser...those characters
    show up as unknown
    >characters with a ? or box. Is there a way to
    preg_replace those unknown
    >characters before displaying them.
    What about fixing the encoding problem instead? If chars get
    lost during
    such a transfer
    document->script->database->script->browser it's always
    an encoding problem somewhere down the road.
    The recommendation these days is to use UTF-8, which avoids
    most of
    these old problems. You just have to make sure that your
    documents are
    properly stored as UTF-8 in the database and delivered as
    such to the
    script and the browser, then you don't have to worry about
    special chars
    anymore.
    That's just the general idea. I can't be more specific, since
    I don't
    know your conversion script or the database structure.
    Micha

  • Replace non ascii characters

    I have a person that is called "josée".
    I need "JOSEE" to be inserted into the database, not "JOSéE".
    I also have a person called "jürgen".
    I need "JURGEN" to be inserted into the database.
    Is there a way to do this ?
    (replace the non-ascii character by his corresponding ascii character)

    Thanks ebrian, I tried your suggestion:
    SELECT VALUE, REGEXP_REPLACE(VALUE, '^[:ascii:]') FROM AUDIT_TAB_COLUMNS WHERE VALUE = '' ;
    It sill leaves in the square unfortuneatly.
    Dave

  • [Solved] no non-ASCII character input in rxvt-unicode

    Hello everyone,
    For some days now, I can't write any non-ASCII characters any more in rxvt-unicode and rxvt-unicode-patched. Unfortunately, downgrading the rxvt-unicode package doesn't seem to help. To have at least a temporary solution, I'd like to know at least which packages I could try to downgrade as well. Any ideas, anyone?
    greez,
    maxmin
    Last edited by Maximalminimalist (2011-03-12 13:12:26)

    When I try to type a non-ASCII-character I get nothing at all. This happens with my custom keyboard layout (modified programmer dvorak) and in some layouts I already tried (us: altgr-intl, ch, de and fr)
    When I paste a non-ASCII characters in rxvt-unicode I get
    maxmin ~ $ ?
    This happens only on my x86_64 desktop which is more up to date than my i686 laptop. (I'm afraid now to do any updates.)
    EDIT: I'm sorry, I don't know what you mean with locale settings. What do you mean with that?
    EDIT2: Maybe just typing locale in the terminal is what you mean:
    maxmin ~ $ locale
    locale: Cannot set LC_CTYPE to default locale: No such file or directory
    locale: Cannot set LC_MESSAGES to default locale: No such file or directory
    locale: Cannot set LC_ALL to default locale: No such file or directory
    LANG=en_US.utf8
    LC_CTYPE="en_US.utf8"
    LC_NUMERIC="en_US.utf8"
    LC_TIME="en_US.utf8"
    LC_COLLATE="en_US.utf8"
    LC_MONETARY="en_US.utf8"
    LC_MESSAGES="en_US.utf8"
    LC_PAPER="en_US.utf8"
    LC_NAME="en_US.utf8"
    LC_ADDRESS="en_US.utf8"
    LC_TELEPHONE="en_US.utf8"
    LC_MEASUREMENT="en_US.utf8"
    LC_IDENTIFICATION="en_US.utf8"
    LC_ALL=
    With other terminal emulators I get sometimes also nothing and sometimes right displayed but wrong interpreted character in vim. I didn't take notes while doing that but I'll try again if needed.
    Last edited by Maximalminimalist (2011-03-06 21:51:23)

  • ALV Grid bug when dealing with non-ASCII character

    Dear all,
    I have a requirement to display user's remarks on ALV.  The data element of the remarks column is TEXT200.  I know that each column in an ALV Grid can display at most 128 characters.  Since my SAP is an Unicode system, I expect that each column in my ALV Grid can display 128 Chinese characters, too.  However, the ALV Grid only display 42 Chinese characters at most.  Is this a bug in ALV Grid?  How can I fix it?
    I did a small experiment.  The results are listed below.  My version is Net Weaver 7.01.  The results show that the bug does not exist in ALV List.  However, my user prefers ALV Grid, which is more beautiful and elegant.
    Type of ALV
    Max number of
    ASCII character
    in an ALV column
    Max number of
    non-ASCII character
    in an ALV column
    REUSE_ALV_GRID_DISPLAY
    128
    42 Chinese characters
    CL_SALV_TABLE
    128
    42 Chinese characters
    CL_GUI_ALV_GRID
    128
    42 Chinese characters
    REUSE_ALV_LIST_DISPLAY
    132
    132 Chinese characters
    If you encounter the bug, please post your solution.  Thanks a lot. 

    It looks like limitation of ALV grid cell, which can contain up to 128 bytes in SAP gui.
    Your unicode characters are probably 3 bytes each.
    Check OSS Note 910300 for more detailed info.
    EDIT: Note 1401711 seems to be a correction for your issue It allows to use 128 characters (even if they take more than 128 bytes).

  • Is Linksys WRT54GH SSID can contains the non-ascii character?

    is Linksys WRT54GH SSID can contains the non-ascii character?
    we need to use it for our wireless testing, but i dont know if the SSID can contains non-ascii.
    anybody can help me? hurry, i will wait answer online.
    thanks in advance!
    Solved!
    Go to Solution.

    thank you  very much, Ricewind
    SSID cant contain non-ascii characters, it make me sad and disappointed
    why we can  set T-link router SSID with non-ascii characters?

  • Unicode value of a non-ASCII character

    Hi,
    Suppose, the unicode value of the character ् is '\u094d'.
    Is there any Java function which can get this unicode value of a non-ASCII character.
    Like:
    char c='्';
    String s=convertToUnicode(c);
    System.out.println("The unicode value is "+ s);
    Output:
    The unicode value is \u094d
    Thanks in advance

    Ranjan_Yengkhom wrote:
    I have tried with the parameter
    c:\ javac -encoding utf8 filename.java
    Still I am getting the same print i.e. \u3fIf it comes out as "\u3f" (instead of failing to compile or any other value), then your source code already contains the question mark. So you already saved it wrong and have to re-type it (at least the single character).
    >
    Then I studied one tutorial regarding this issue
    http://vietunicode.sourceforge.net/howto/java/encoding.html
    It says that we need to save the java file in UTF-8 format. I have explored most of the editors like netbean, eclipse, JCreator, etc... there is no option to save the java file in UTF-8 format.That's one way. But since that is so problematic (you'll have to remember/make sure to always save it that way and to compile it using the correct switch), the better solution by far is not to use any non-ASCII characters in your source code.
    I already told you two possible ways to achieve that: unicode escapes or externalized strings.
    Also read http://www.joelonsoftware.com/articles/Unicode.html (just because it's related, essential information and I just posted that link somewhere else).

  • Remove non ascii character

    i need a SQL or Procedure that will search non ascii character  in data and update the data by removing it
    Suppose there is table TABLE1 with Column NAME
    it contain number of row and few has non ascii character eg 'CharacterÄr'
    My sql or procedure should be able to search  'CharacterÄr' and update the row with 'Character'
    i.e. removing the non ascii character 'Ä' from the data

    Hi,
    Okay, in that case:
    SELECT str
    ,      REGEXP_REPLACE ( str
                          , '[^[:cntrl:] -~]'
                          )   AS new_str
    FROM    table_x
    or, to actually change the rows that contain the bad characters:
    UPDATE  table_x
    SET     str = REGEXP_REPLACE ( str
                                 , '[^[:cntrl:] -~]'
    WHERE   REGEXP_LIKE ( str
                        , '[^[:cntrl:] -~]'

  • Find the Special character and non Ascii character

    Hi:
    i have table ,this table column name contain some datas like
    sno name
    1 CORPORATIVO ISO, S.A. DE C.V.
    2 (주)엠투소프트
    3 TIMELESS
    4 南京南瑞集团公司
    5 PHOTURIS
    6 Ace Informática S/C ltda
    7 Computacenter AG & Co. oHG
    8 아이티앤씨
    9 MOCA
    10 anbarasan
    my requirement:
    1)i need to search the name column where contain the special character and non ascii character..if found any non ascii or spcial character ..need to say flag ''yes".if not found need to say "no"...kindly help on this issus...

    i need some example..i am not have any idea....
    i have table ,this table column name contain some datas like
    sno name
    1 CORPORATIVO ISO, S.A. DE C.V.
    2 (주)엠투소프트
    3 TIMELESS
    4 南京南瑞集团公司
    5 PHOTURIS
    6 Ace Informática S/C ltda
    7 Computacenter AG & Co. oHG
    8 아이티앤씨
    9 MOCA
    10 anbarasan
    my requirement:
    1)i need to search the name column where contain the special character and non ascii character..if found any non ascii or spcial character ..need to say flag ''yes".if not found need to say "no"...kindly help on this issus...

  • Non-hex character in hex string

    I'm getting this messsage when I try to open an Illustrator file:
    "Acrobat PDF file format is having difficulties. Non-hex character in a string."
    I', using illustrator CS6 and I need my file opened..!

    Nobody can know without seeing the file and having more info about your system. Does the file open in Acroibat/ Adobe Reader for instance?
    Mylenium

  • Non-Alphanumeric Character in a String

    I understand if we have a non-alphanumeric character in a String, we have to put a backward slash in front of that character. For example, we have to put a backward slach in front of a "backward slash". Eg. we do \\ to put a backward slash in a String.
    What about a dot (.) or a dash (-)? Do we have to do the same thing?

    No, it's only for control characters like "newline", characters with special meaning in strings i.e. " and \ and any character that can't be represented in the character encoding of your java source file.

  • Replacing a special character in a string with another string

    Hi
    I need to replace a special character in a string with another string.
    Say there is a string -  "abc's def's are alphabets"
    and i need to replace all the ' (apostrophe) with &apos& ..which should look like as below
    "abc&apos&s def&apos&s are alphabets" .
    Kindly let me know how this requirement can be met.
    Regards
    Sukumari

    REPLACE
    Syntax Forms
    Pattern-based replacement
    1. REPLACE [{FIRST OCCURRENCE}|{ALL OCCURRENCES} OF]
    pattern
              IN [section_of] dobj WITH new
              [IN {BYTE|CHARACTER} MODE]
              [{RESPECTING|IGNORING} CASE]
              [REPLACEMENT COUNT rcnt]
              { {[REPLACEMENT OFFSET roff]
                 [REPLACEMENT LENGTH rlen]}
              | [RESULTS result_tab|result_wa] }.
    Position-based replacement
    2. REPLACE SECTION [OFFSET off] [LENGTH len] OF dobj WITH new
                      [IN {BYTE|CHARACTER} MODE].
    Effect
    This statement replaces characters or bytes of the variable dobj by characters or bytes of the data object new. Here, position-based and pattern-based replacement are possible.
    When the replacement is executed, an interim result without a length limit is implicitly generated and the interim result is transferred to the data object dobj. If the length of the interim result is longer than the length of dobj, the data is cut off on the right in the case of data objects of fixed length. If the length of the interim result is shorter than the length of dobj, data objects of fixed length are filled to the right with blanks or hexadecimal zeroes. Data objects of variable length are adjusted. If data is cut off to the right when the interim result is assigned, sy-subrc is set to 2.
    In the case of character string processing, the closing spaces are taken into account for data objects dobj of fixed length; they are not taken into account in the case of new.
    System fields
    sy-subrc Meaning
    0 The specified section or subsequence was replaced by the content of new and the result is available in full in dobj.
    2 The specified section or subsequence was replaced in dobj by the contents of new and the result of the replacement was cut off to the right.
    4 The subsequence in sub_string was not found in dobj in the pattern-based search.
    8 The data objects sub_string and new contain double-byte characters that cannot be interpreted.
    Note
    These forms of the statement REPLACE replace the following obsolete form:
    REPLACE sub_string WITH
    Syntax
    REPLACE sub_string WITH new INTO dobj
            [IN {BYTE|CHARACTER} MODE]
            [LENGTH len].
    Extras:
    1. ... IN {BYTE|CHARACTER} MODE
    2. ... LENGTH len
    Effect
    This statement searches through a byte string or character string dobj for the subsequence specified in sub_string and replaces the first byte or character string in dobj that matches sub_string with the contents of the data object new.
    The memory areas of sub_string and new must not overlap, otherwise the result is undefined. If sub_string is an empty string, the point before the first character or byte of the search area is found and the content of new is inserted before the first character.
    During character string processing, the closing blank is considered for data objects dobj, sub_string and new of type c, d, n or t.
    System Fields
    sy-subrc Meaning
    0 The subsequence in sub_string was replaced in the target field dobj with the content of new.
    4 The subsequence in sub_string could not be replaced in the target field dobj with the contents of new.
    Note
    This variant of the statement REPLACE will be replaced, beginning with Release 6.10, with a new variant.
    Addition 1
    ... IN {BYTE|CHARACTER} MODE
    Effect
    The optional addition IN {BYTE|CHARACTER} MODE determines whether byte or character string processing will be executed. If the addition is not specified, character string processing is executed. Depending on the processing type, the data objects sub_string, new, and dobj must be byte or character type.
    Addition 2
    ... LENGTH len
    Effect
    If the addition LENGTH is not specified, all the data objects involved are evaluated in their entire length. If the addition LENGTH is specified, only the first len bytes or characters of sub_string are used for the search. For len, a data object of the type i is expected.
    If the length of the interim result is longer than the length of dobj, data objects of fixed length will be cut off to the right. If the length of the interim result is shorter than the length of dobj, data objects of fixed length are filled to the right with blanks or with hexadecimal 0. Data objects of variable length are adapted.
    Example
    After the replacements, text1 contains the complete content "I should know that you know", while text2 has the cut-off content "I should know that".
    DATA:   text1      TYPE string       VALUE 'I know you know',
            text2(18)  TYPE c LENGTH 18  VALUE 'I know you know',
            sub_string TYPE string       VALUE 'know',
            new        TYPE string       VALUE 'should know that'.
    REPLACE sub_string WITH new INTO text1.
    REPLACE sub_string WITH new INTO text2.

  • Replacing non-ASCII characters with HTML charcter references

    Hi All,
    In Oracle 10g or greater is there a built-in function that will convert a string with non-ASCII characters like this
    a b č 뮼
    into an ASCII string with HTML character references like this?
    a b & # x 0 1 0 D ; & # x B B B C ;
    (note I had to include spaces between each character in the sample code for message to prevent the forum software from converting my text)
    I tried using
    utl_i18n.escape_reference( val, 'us7ascii' )
    but for some reason it returns
    a b c & # x B B B C ;
    Note how it converted the Western European character "č" to its unaccented counterpart "c", not "& # x 0 1 0 D ;" (is this a bug?).
    I also tried a custom solution using regexp_replace and asciistr (which I can't include here because the forum software chokes on it) but it only returns the correct result for values <=4000 characters long. Unfortunately asciistr doesn't appear to accept CLOB values larger than 4000 characters. It returns an error message like
    (ORA-22835: Buffer too small for CLOB to CHAR or BLOB to RAW conversion (actual: 30251, maximum: 4000) ).
    I'm looking for a solution that works on CLOB data of any size.
    Thanks in advance for any insight you can provide.
    Joe Fuda

    So with that (UTF8) in mind, let's take another look.....
    As shown below, I used a AL32UTF8 database.
    Note: I did not use a unicode capable tool for querying. So I set console mode code page to 1250 just to have č displayed properly (instead of posing as an è).
    Also, as a result of using windows-1250 for client character set, in the val column and in the second select's ncr column (iso8859-1), è (00e8) has been replaced with e through character set conversion going from server back to client.
    Running the same code on a database with a db character set such as we8mswin1252, that doesn't define the č (latin small c with caron) character, would yield results with a c in the ncr column.
    C:\>chcp 1250
    Aktuell teckentabell: 1250
    C:\>set nls_lang=.ee8mswin1250
    C:\>sqlplus test/test
    SQL*Plus: Release 11.1.0.6.0 - Production on Fri May 23 21:25:29 2008
    Copyright (c) 1982, 2007, Oracle.  All rights reserved.
    Connected to:
    Oracle Database 11g Enterprise Edition Release 11.1.0.6.0 - Production
    With the OLAP option
    SQL> select * from nls_database_parameters where parameter like '%CHARACTERSET';
    PARAMETER              VALUE
    NLS_CHARACTERSET       AL32UTF8
    NLS_NCHAR_CHARACTERSET AL16UTF16
    SQL> select unistr('\010d \00e8') val, utl_i18n.escape_reference(unistr('\010d \00e8'),'us7ascii') NCR from dual;
    VAL  NCR
    č e  c e
    SQL> select unistr('\010d \00e8') val, utl_i18n.escape_reference(unistr('\010d \00e8'),'we8iso8859p1') NCR from dual;
    VAL  NCR
    č e  &# x10d; e     <- "è"
    SQL> select unistr('\010d \00e8') val, utl_i18n.escape_reference(unistr('\010d \00e8'),'ee8iso8859p2') NCR from dual;
    VAL  NCR
    č e  č &# xe8;
    SQL> select unistr('\010d \00e8') val, utl_i18n.escape_reference(unistr('\010d \00e8'),'cl8iso8859p5') NCR from dual;
    VAL  NCR
    č e  &# x10d; &# xe8;In the US7ASCII case, where it should be possible for all non-ascii characters to be escaped, it seems as if the actual escape step is skipped over.
    Hope this helps to understand whether utl_i8n is usable or not in your case.
    Message was edited by:
    orafad
    Fixed replaced character references :)

  • Non-ascii character problem

    hi
    Scenario: Reading Flat-File and writing into Oralce Table.
    In my database procedure, I have declared a column say X as VARCHAR2 to store string values. In the flat-file, some times the data for column X comes as Non-ASCII string values (e. SäHKOTOIM) and because of this the database procedure raises ORA-06502:PL/SQL:numeric or value error.
    So, how I can identify that the flat file has non-ascii values so that I can reject that record and move with another record?
    your suggestion will be greatly appreciated.
    Regards
    shakeel

    Hi,
    You set you nls_database_parameters which is compatiable with you input (non- ascii) characters.
    Try to use the LCSSCAN in order to know the relevant character set with respective to input and set the nls_parameters
    https://students.kiv.zcu.cz/doc/oracle/server.102/b14225/ch12scanner.htm#i1016925
    - Pavan Kumar N

Maybe you are looking for

  • How to read particular Sector and Block in Mifare 1k through Java?

    Hi Friends.. Do you know how to read particular Sector and Block in Mifare 1K through Java?.. I've created the simple application that read data from Mifare 1K, but i've problem when i want to read the other sectors and blocks.. i tried to read the b

  • SQL 2008 R2 standard reports - question about server dashboard and what it refers to as "adhoc" queries

    So I have started looking at some of the standard reports available with SSMS and in particular, the "server dashboard".   One thing that caught my attention were the charts that referred to "adhoc" queries.  I wondered how those were being defined,

  • Select a portion of a Voice Memo

    I recorded a Voice Memo on my 5s in order to capture an abnormal sound from my computer for tech support.  It took 15 minutes before the sound occurred, so I've got a 15-minute Voice Memo, which syncs just fine into iTunes on my PC.  However, this is

  • HT201197 Crash while sharing

    I have a simple file in MP4 with a duration of about 45 minutes. I have not edited it, only add a little movie file (about 30 seconds) at the begin and at the end. After about 20 hours of sharing process the FCPX crash. Here the log. Can somebody hel

  • In Yahoo email, how do I get the autofill to work in the address line?

    When I use Google or Explorer, the autofill works fine but not in Firefox. When I start to send an email or forward one, and start to put in the address, a window used to pop up with several address possibilities, but not any more. Why and how do I f