Support for Unicode Strings
Hi
I was wondering if Berkeley DB supports storing and retrieving Unicode Strings. (I guess it should be supporting it but just wanted to confirm) and if yes what encoding does it use.
Thanks,
KarthikR
Again, I'm not an authority, but I believe so, yes. From what I've read, BDB only knows about "data," which basically means a sequence of bytes. Thus, in C you could define a struct that contains your specific schema for keys and values, but you'd just write the raw data behind the struct to BDB, which doesn't care or know about your particular format.
So you'd probably have to convert your unicode string to a byte sequence in some way or another. UTF-8 is the first option that comes to mind, but maybe UTF-16 or UTF-32 (if you really don't care about space) would be simpler to implement.
Hope this helps,
Daniel
Similar Messages
-
Call upon even better support for Unicode
Hello
Following some messages I have posted regarding problems I encountered while developing a non-English web application, I would like to call upon an even better support for Unicode. Before I describe my call, I want to say that I consider Berkeley DBXML a superb product. Superb. It lets us develop very clean and maintainable applications. Maintainability is, in my view, the keyword in good software development practices.
In this message I would like to remind you that the US-ASCII 8-bit set of characters only represents 0.4% of all characters in the world. It is also true to say that most of our software comes from efforts of American developers, for which I am of course very grateful.
But problems with non US-ASCII characters are very very time consuming to solve. To start with, our operating systems need to be configured especially for unicode, our servers too, our development tools too, our source code too and, finally, our data too. That's a lot of configuring, isn't it? Believe me, as a Flemish french-speaking, danish-speaking developer who develops currently a new application in Portuguese, I know what I am talking about.
Have you ever tried to write a Java class called Ação.java, that loads an xml instance called Ação.xml that contains something like <?xml version="1.0" charset="utf-8"?></ação variável="descrição"/>? It takes at least the double of time to have all this work right in a web application on a Linux server than it would take to write a Acao.java that loads Acao.xml containing <?xml version="1.0" charset="us-ascii"?></acao variavel="descricao"/> (which is clearly something we do not want in Portugal).
I have experienced a problem while using the dbxml shell to load documents that have utf-8 encoded names. See difficulties retrieving documents with non ascii characters in name The work around is not to use the dbxml shell, with which I am of course not very happy.
So, while trying not to be arrogant and while trying to express my very very great appreciation for this great product, I call upon even better support for Unicode. After all, when the rest of us, that use another 65279 characters in our software, will be able to use this great product without problems, will it not contribute to the success of Berkeley DBXML?
Thank you
Koen
Edited by: koenheene on 29/Out/2009 3:09Hello John and thank you for replying,
You are completely correct that it is a shell problem. I investigated and found solutions for running dbxml in a Linux shell. On Windows, as one could expect, no solution so far.
Here is an overview of my investigation, which I hope will be useful for other developers which also presist writing code and xml in their language.
difficulties retrieving documents with non ascii characters in name
I was wondering though if it would not be possible to write the dbxml shell in such a way that it becomes independent from the encoding of the shell. Surely their must be, not? Rewrite dbxml in Java? Any candidates :-) ?
Thanks again for the very good work,
Koen -
System call support for unicodes
Hi Solaris guru,
One of my application (C,Solaris2.7) is required to work in multiple languages. This application makes use of system & C library calls. Is it possible for a japanese user to create file names in japanese? if so how will I able to use these names (let's assume unicodes) with standard system calls and library routines which consider file names has char *?
I have noticed that Solaris provides wchar_t and (wchar.h) wide string library calls (Ex, wprintf, wscanf, wcstrcmp etc). are there any similar w-version of system calls?
I greatly appreciate your help.
Cheers
RameshI don't know of a Solaris system call to copy files. I do know there is no such C or C++ standard library function.
It's easy enough to write a file copy routine, however.
C++ 4.2 is obsolete and no longer supported. It predates the 1998 C++ standard by a few years.
But using old-style C++, here is a copy-file routine:
#include <fstream.h>
int copyfiles(const char* i, const char* o)
ifstream in(i, ios::in|ios::binary);
ofstream out(o, ios::out|ios::binary);
out << in.rdbuf();
return !(!in && !out);
You pass it the names of the input and output files. It opens the files in binary mode, copies input to output if possible, and reports status by returning 1 for success and 0 for failure.
Using standard C++, the routine looks like this:
#include <fstream>
bool copyfiles(const char* i, const char* o)
std::ifstream in(i, std::ios::binary);
std::ofstream out(o, std::ios::binary);
out << in.rdbuf();
return !(!in && !out); -
Disappointing lack of support for Unicode
I was very disappointed to find that Pages 2 still cannot properly support Unicode. TextEdit does a vastly better job of supporting Unicode. I can paste Unicode text into Pages 2 that I have already edited in TextEdit, but it cannot be properly typed in or edited in Pages 2 with the full range of Unicode. Some Unicode is okay, but not all of it. Absolutely can't be done! When are they going to get it right?
does Pages 2 do
the double-overstrike formatting that we discussed in
re Navajo some times back? Pages 1, unlike TextEdit,
wouldn't let you click an ogonek or acute accent onto
a vowel with the other. (I was looking to make Navajo
a- or o-with-acute-accent-and-ogonek.)
I think so. I believe even Pages 1 was able to do that right after some OS update, but I can't remember now. Anyway I just tested Pages 2 (10.3.9) and made the a and o with the two accents using OptionShiftm for combining ogonek and OptionShifte for combining acute (US Extended Layout). Lucida Grande Font. -
Hi,
I have a problem with retrieving Japanese characters from the Oracle Database.
Here is the scenario.
Environment
I have an Oracle 8i database that has been configured for AL24UTFFSS character code set in the NLS_DATABASE_PARAMETERS. From UDS, some Japanese characters were inserted into tables of this database with �AMERICAN_AMERICA.JA16SJIS� as NLS_LANG environment variable on the machine that runs the DBSession.
Positive result from [b]iSQL *Plus
iSQL*Plus supports Unicode. And the results are as expected. I have a opened a session with NLS_LANG for the session being set to �AMERICAN_AMERICA. AL24UTFFSS�. I was able to retrieve the Japanese characters by a SELECT statement with no problem.
Negative Results from UDS 5.0.15
Understanding that starting with release 5.0 SP1, UDS offers full support of the Unicode codeset, I was expecting similar results from UDS client application. The DBSession was established with the NLS_LANG set to �AMERICAN_AMERICA. AL24UTFFSS�. (same as for iSQL*Plus). When I executed a small test application on a client PC with FORTE_LOCALE being set for �ja_jp.UTF8�, I only could see junk characters on the screen. I have tried even by setting �ja_jp.sjs� as FORTE_LOCALE. I am getting junk characters in all occasions unless I set �AMERICAN_AMERICA.JA16SJIS� as NLS_LANG environment variable on the machine that runs the DBSession, which I don�t want as I want Korean, Chinese and additional characters sets also handled properly leveraging the fact that UDS now Forte supports Unicode.
We have multiple advantages if we can set a single NLS_LANG to support multiple languages. Apparently, there seems to be no problem with Oracle as I could retireve the charactes from iSQL*Plus.
I appreciate your help in this.
Thank you
GSHi,
In the above description I told that we are using UDS 5.0.15.
Forte says that starting with release 5.0 SP1, the UDS offers full support of the Unicode codeset .
http://sunsolve.sun.com/pub-cgi/retrieve.pl?doc=finfodoc%2F67723
Does it mean that I have the version prior the version that supports Unicode?
Regards
GS -
Hello, I am in the process uf upgrading database installation scripts so they will support Unicode. I just want to clarify that by changing the Character set to say for example AL32UTF8 and the National Character Set to UTF8 the database will then be able to support Unicode. Do I need to also change all the VARCHAR2 and CHAR2 data types to NVARCHAR2 and CHAR2? When changing the character sets do the database then default to bytes instead of characters for multibyte character storage? Thank you.
-- DavidYou would not want a situation where some clients have a database character set of AL32UTF8 and are storing the data in CHAR/ VARCHAR2 columns and some clients have a non-Unicode database character set, a Unicode national character set, and store their Unicode data in NCHAR/ NVARCHAR2 columns (I'm assuming from the context that you are some sort of application vendor here so that different clients are trying to run the same application). That would massively increase the complexity of your application code and make testing & supporting the application substantially more difficult.
If at all possible, it is preferable to change the database character set to Unicode for existing databases. This may involve exporting & importing some or all of the data or it may be possible online (there is a chapter in the Globalization Support document that covers character set migration and the various options you have).
Storing data in NCHAR/ NVARCHAR2 columns should generally be a last resort (unless you really know what you are doing and want to leverage different Unicode encodings). You are likely to cause yourself all sorts of headaches trying to support national character set data types.
Justin -
<p>Does Brio 8 (or any of the subsequent Hyperion versions)provide support for Unicode characters. If no, how do wetackle reporting from databases containing non-English characterslike German, Japanese etc. Thanks..Cheers..</p>
It's not what you describe, but here is more detail on what I'm doing.
This an example of the value string I'm storing. It's a simple xml object, converted to a unicode string, encoded in utf-8:
u'<d cdt="1267569920" eml="[email protected]" nm="\u3059\u3053\u3099\u304f\u597d\u304d\u306a\u4e16\u754c" pwd="2689367b205c16ce32ed4200942b8b8b1e262dfc70d9bc9fbc77c49699a4f1df" sx="M" tx="000000000" zp="07030" />'
The nm attribute is Japanese text: すごく好きな世界
So when I add a secondary index on nm, my callback function is an xml parser which returns the value of a given attribute:
Generically it's this:
def callbackfn (attribute):
"""Define how the secondary index retrieves the desired attribute from the data value"""
return lambda primary_key, primary_data: xml_utils.parse_item_attribute(primary_data, attribute)
And so for this specific attribute ("nm"), my callback function is:
callbackfn('nm')
As I said in my original post, if I add this the the db, I get this type error:
TypeError: DB associate callback should return DB_DONOTINDEX/string/list of strings.
But when I do not place a secondary index on "nm", the type error does not occur.
So that's consistent with what Sandra wrote in the other post, i.e.:
"Berkeley DB never operates on the value part of a record. Values are simply payload, to be stored with keys and reliably delivered back to the application on demand."
My guess is that I need to add an additional utf-8 encoding or decoding to the callback function, or else define a custom comparsion function so the callback will know what to do with the nm attribute value, but I'm not sure what exactly. -
Can TestStand 2.0.1 store/compare strings (local variables, e.t.c) as unicode strings? If so, how can I tune TS to do it? Is there built-in support for such strings in newer versions of TS? I've not found sufficient information on the matter at the DevZone Forums.
Hi Paul,
Try this example, I'm sure I downloaded this from NI website.
This just reads keys from an INI file but you can also write to an ini file by using the appropiate functions in the dll.
Hope this helps
Regards
Ray Farmer
Regards
Ray Farmer
Attachments:
ReadIniFile.seq 17 KB
IOWrite.ini 1 KB -
We are trying to Localize PDA using J2ME toolkit. Do the PDA's support unicode? Could you help me which models do and which don't? Is is possible to install Unicode fonts in a PDA device? Any tutorials would be praised. We have various emulators including Palm OS emulator.
Thanking you in advance,
KushalIt's not what you describe, but here is more detail on what I'm doing.
This an example of the value string I'm storing. It's a simple xml object, converted to a unicode string, encoded in utf-8:
u'<d cdt="1267569920" eml="[email protected]" nm="\u3059\u3053\u3099\u304f\u597d\u304d\u306a\u4e16\u754c" pwd="2689367b205c16ce32ed4200942b8b8b1e262dfc70d9bc9fbc77c49699a4f1df" sx="M" tx="000000000" zp="07030" />'
The nm attribute is Japanese text: すごく好きな世界
So when I add a secondary index on nm, my callback function is an xml parser which returns the value of a given attribute:
Generically it's this:
def callbackfn (attribute):
"""Define how the secondary index retrieves the desired attribute from the data value"""
return lambda primary_key, primary_data: xml_utils.parse_item_attribute(primary_data, attribute)
And so for this specific attribute ("nm"), my callback function is:
callbackfn('nm')
As I said in my original post, if I add this the the db, I get this type error:
TypeError: DB associate callback should return DB_DONOTINDEX/string/list of strings.
But when I do not place a secondary index on "nm", the type error does not occur.
So that's consistent with what Sandra wrote in the other post, i.e.:
"Berkeley DB never operates on the value part of a record. Values are simply payload, to be stored with keys and reliably delivered back to the application on demand."
My guess is that I need to add an additional utf-8 encoding or decoding to the callback function, or else define a custom comparsion function so the callback will know what to do with the nm attribute value, but I'm not sure what exactly. -
Anyone else have problems with unicode strings with ADFm?
I found very strange behavior (bug?) with unicode strings passed through ADFm bindings to EJB method parameters.
The method is invoked only once when non-unicode (plain ASCII) characters involved.
The same PageDef invokes method twice(?!) when parameter receives unicode characters from page! Not only that, but the second time method is invoked with completely messed-up string values (two-byte unicode chars replaced with ?).
Obviously this behavior is making any globalized application (with unicode inputs) development impossible.
Am I doing something wrong here?
In several other threads I found complains on JDev (even 11g) not handling Unicode/UTF-8 properly. But I could not find any Oracle statements on this issue and commitment to true and easy Unicode application development support.Hi Steve,
I have just sent you an email with test-case. You are the 5th person from Oracle I'm sending this same email. :)) But never received any answer, comment, confirmation or even rejection of the case. Please, understand that for use from non-ASCII regions this issue is BIG issue. If I cannot have my local letters in ADF then everything else is not relevant. No one will buy my SW even if I give it for free (and especially if they have to pay for Oracle licenses which are somehow far from free).
Kind regards,
Pavle -
How to use Unicode strings for tool titles?
So one of the few remaining Illustrator suites that does not accept ai::UnicodeString objects for arguments is AIToolSuite. I would really (really!) like to get some Unicode characters in some of my tool titles and tips, but figuring out what goes on behind the sAITool->AddTool() call, with its char* arguments, is tough.
I have tried the obvious stuff, like passing UTF-8 multibyte data, which doesn't work. I have tried more nebulous approaches, like passing Z-string data with Adobe's strange way of "escaping" Unicode code points (they typically use "^U+1234" instead of the de facto "\u1234"), which doesn't work either.
The result is that Unicode data either doesn't show at all, or shows as the typical Latin-based garbage full of diacritical marks (e.g. "éáÊöãÀ") which is nowhere near the original data that was passed in.
Anybody have some insight?
PS: I plan on filing a feature request shortly for ai::UnicodeString support in AIToolSuite, but until then...
Thanks,
- GarrettHello Mark -
At this time TestStand has no unicode support. The multi-byte support that we do offer is based on the Windows architecture that handles Asian language fonts. It really isn't meant to provide a bridge for unicode values in TestStand. Certainly, your Operator Interface environment will have its own support level for unicode, i.e. at this time neither LabWindows/CVI version 6.0 nor LabVIEW 6.1 officially support unicode characters. This is why you will see that the units defined in the TestStand enumerators are all text-based values.
I have run a quick test here, probably similar to what you were doing on your end, and I am uncertain if you will get the database behavior you want from TestStand. The database logging steps and API all use basic char sty
le strings to communicate to the driver. Even though you are reading in a good value from Excel, TestStand is interpreting the character as the nearest ASCII equivalent, i.e. "Ω" will be stored and sent to the database as "O". If you have a stored proceedure in Oracle that is calling on some TestStand variable or property string as an input, then it is doubtful you will get correct transmission of the values to the database. If your stored proceedure could call into a spreadsheet directly, you would probably have better luck.
Regards,
Elaine R.
National Instruments
http://www.ni.com/ask -
Multi-language support for user-specified text strings used in the forms
multi-language support for user-specified text strings used in the forms
Instead of creating multiple forms, 1 in each different language, for the same service, is there any workaround?Hoan - is your question what are the considerations when creating multiligual catalogs? If so, I can tell you that at other clients I have seen them use a single catalog for one or two languages. For the two langugages, such as Spanish/English, you can create a single catalog with both of them. Once you get to more than two languages, the catalog would get unweildy and is therefore not suggested.
-
XDK support for UTF-16 Unicode
Hi,
Does the Oracle Java XDK, specifically the XSQL servlet and api,
support UTF16-Unicode.
Presumably, .xsql files have to be stored, read and queries
executed in a Unicode compliant format for this to work. Is this
possible currently???
Thanks,
- ManishIf you are using XDK 9.0.1 or later, and you are using JDK 1.3,
this combination supports UTF-16. XSQL inherits the support
for "free" in this case. -
SQL Azure indexer support for Collection(Edm.String)
Is there a plan to support "Collection(Edm.String)
" with a SQL Azure indexer? Maybe via an XML type?
Unless I'm misunderstanding the supported types
https://msdn.microsoft.com/en-us/library/azure/dn946880.aspx
It sort of fizzles out after "time, timespan" but I'm assuming it's Not Supported's all the way downhttp://feedback.azure.com/forums/263029-azure-search/suggestions/7189214-sql-azure-indexer-support-for-collection-edm-strin
Was going to start there but just wanted to vet that it indeed wasn't there.... I realize it's a bit awkward and anti - sql storing data like that in a column and will probably annoy DBAs.
Currently the data we'd use this for would be Customer phone numbers, addresses, VIN for vehicles, and some account numbers... So nothing super fancy... Straight delimiters might get funky with addresses but maybe a standard backspace escape
sequence or letting user use ascii hexcode if the delimiter is in the text..
For now we already have a comma separated SearchText field we've indexed for use with FTS and I just pointed an Edm.string at that column in our DB and it seems to pick up all the comma separated elements... But I'm guessing it's not as efficient as if it
was stored in more specific collections. -
Mathematical Font for Unicode Support
Hi everyone,
Does anyone know a Font which supports most of the mathematical symbols in Unicode 5.0? by that I mean the following tables from Unicode 5.0 (As much as possible!)
- Math Operators U+2200 - U+22FF
- Math Supplemental Math Operators U+2A00 - U+2AFF
- Misc. Math Symbols - A U+27C0 - U+27EF
- Misc. Math Symbols - B U+2980 - U+29FF
I have tried a lot of fonts but no luck so far. I can really use your help guys!jhxcx,
It's currently not on our roadmap to include Unicode support for Measurement Studio VC++. You can make a product suggestion at this page http://digital.ni.com/applications/psc.nsf/default?OpenForm&temp1=&node= to keep the issue on R&Ds radar.
Richard S -- National Instruments --Applications Engineer -- Data Acquisition with TestStand
Maybe you are looking for
-
DOUBLE_CLICK and MOUSE_DOWN on Button
I am trying to utilize DOUBLE_CLICK and MOUSE_DOWN for Button control. But they both are not working together. Whenever I activate MOUSE_DOWN event it simply disables DOUBLE_CLICK Code: // _btn is my Button object _btn.doubleClickEnabled = true; _btn
-
Hyper-v Live Migration not completing when using VM with large RAM
hi, i have a two node server 2012 R2 cluster hyper-v which uses 100GB CSV, and 128GB RAM across 2 physical CPU's (approx 7.1GB used when the VM is not booted), and 1 VM running windows 7 which has 64GB RAM assigned, the VHD size is around 21GB and th
-
When I open my email account thru Firefox, there is a blinking message in my task bar stating that "lorainemihiril" sent me a message. I DO NOT know this person, I can't make it go away and I NEVER use instant messaging. How do I get rid of this??? S
-
Using iPod function with Bluetooth
I recently bought a subaru impreza with a usb connector in which I plug in my iphone. The radio has Bluetooth so I can take calls through the stereo as well I issue is that when the cable is connected and the Bluetooth turned on, for some reason, I c
-
I have a stored procedure from MS SQL backend and the result heading is WeekResult1, WeekResult2, WeekResult3... etc until Week20Result for report datasource. I would like to assign the heading at run time to May/01/2013, May/08/2013, May/15/2013 ...