Unicode Normalization Form??

Hi Folks,
To stay consistent with my Apache host, I've chosen UTF-8 as
my default page
encoding under "New Document" in DW 8 Preferences.
Dreamweaver "Help" implies that you need to specify a
"Unicode Normalization
Form" if you select Unicode (UTF-8). It suggests using
"...Normalization
Form C because it's the most common one used in the Character
Model for the
World Wide Web. Macromedia provides the other three Unicode
Normalization
Forms for completeness."
Is specifying a normalization form absolutely necessary? If
so, is Type C
the proper choice?
Any advice appreciated. Thanks.
Phil
Lost in the World of Encoding

Normalization type does not apply to UTF-8, so it doesn't
matter. UTF-16
and wider it does. In any case, no harm in specifying type C
anyway.
"Phil Papeman" <plpapeman@remove_comcast.net> wrote in
message
news:eklf99$8ps$[email protected]..
> Dreamweaver "Help" implies that you need to specify a
"Unicode
> Normalization Form" if you select Unicode (UTF-8). It
suggests using
> "...Normalization Form C because it's the most common
one used in the
> Character Model for the World Wide Web. Macromedia
provides the other
> three Unicode Normalization Forms for completeness."
>

Similar Messages

Use of unicode in forms 10g

Can anyone guide through steps to use unicode in forms 10g.
i have set computer environment variable NLS_LANG.
and settings in default.env and nls_lang in regedit all to 'AMERICAN_AMERICA.UTF8 '

Thanks Billy.
Yes you are right.
Here why I discarded that option is,
I may get the source files with changing layouts.
My Actual scenario is as follows.
Initially we developped all the things using PL/SQL packages. Its working fine.
But as per the inputs we received from requirements group, the file structure changes dynamically. and we would able to consider those new columns also. We should be able to changes the rules dynamically.
Lets say, we doing fullouter join on Src_A and Src_B. on columns col1_A and col1_B.
Now the requirement changes in a way that, the join should be done on Src_A and Src_C. on columns col1_A and col_C.
For this I need to define a new package.
Instead of that, I would like to do everything dynamically based on configuration parameters given as input.
Thank you,
Regards,
Gowtham Sen

Unicode in Forms

Dear all,
How will be the impact in a Form in 6i if the database is migrated to a unicode database? Is enough to change the NLS_LANG to display Unicode fonts or should I to recompile the Form or change de char or varchar fields?
Thanks a lot !!
Best Regards.
Sara Garcia

Yes Forms 6i supports running in Unicode (UTF8) when Web Deployed or in client server on Unicode compliant clients - Windows NT & 2000.
Your database will of course have to be running in Unicode as well in order to store the full range of characters.
To enable unicode in Forms simply set the characterset portion of NLS_LANG environment variable to UTF8.

Unicode normalisation form C and Apple Safari

Technically, the World Wide Web Consortium specifies normalisation form C. This suggests that Apple Safari 3.2.1 should not establish equivalences (aliases, synonyms) to other coded characters if a coded character has no specified equivalences as per normalisation form C. Nonetheless, Apple Safari 3.2.1 does seem to establish equivalences.
Anyone have any thoughts?
/hh
Reference:
http://unicode.org/faq/normalization.html#6

According to Asmus Freytag, the character LATIN SMALL LETTER LONG S was encoded because it involves a semantic distinction as opposed to a stylistic distinction in that writing system
It does have a compatibilty decomposition to "s" , however.
If the World Wide Web Consortium specifies Form C, if no equivalences are established for LATIN SMALL LETTER LONG S under Form C, and if HTML is opened from disk in a browser, then LATIN SMALL LETTER S should not be successful as a synonym in a search string, it seems. Thus, Faust should not successfully retrieve Fauſt
I guess that would be correct under String Identity Matching as defined here?
http://www.w3.org/TR/charmod-norm/#sec-IdentityMatching
It seems true that Edit > Find uses a less restrictive matching system, but it's not clear to me that doing that is contrary to the standards in some way.
It's not just Safari but all apps where you can find ſ by looking for s, right?
and the Related Characters pane in the Character Palette should not show that LATIN SMALL LETTER S and LATIN SMALL LETTER LONG S are unconditional synonyms. Rather, they are conditional synonyms.
I'm somewhat mystified as to exactly what the Related Charcters pane is supposed to show, other than characters that look similar. I wonder how Apple chooses them? In any case I would hope the compatiblity decomposition of a character would appear there.
It seems to me that search services can compete by seeming to be more successful, and to seem more successful search services can establish equivalences which are broader than the equivalences that an author is entitled to expect based on specifications and standards.
I think the w3c standards about this are mainly related to the form a text should have on the web, rather than what results search services should return (unless the services perhaps specify they are doing a "string identity match").

Normalization forms

how many normal forms we have total in SQL server .
i just know 3 normal forms

From:
http://en.wikipedia.org/wiki/Database_normalization
The main normal forms are summarized below.
Normal form
Defined by
In
Brief definition
1NF
First normal form
Two versions:
E.F. Codd (1970),
C.J. Date (2003)
1970[1] and 2003[10]
The domain of each
attribute contains only
atomic values, and the value of each attribute contains only a single value from that domain.[11]
2NF
Second normal form
E.F. Codd
1971[2]
No non-prime attribute in the table is
functionally dependent on a
proper subset of any
candidate key
3NF
Third normal form
Two versions:
E.F. Codd (1971), C. Zaniolo (1982)
1971[2] and 1982[12]
Every non-prime attribute is non-transitively dependent on every candidate key in the table. The attributes that do not contribute to the description of the primary key are removed from the table. In other words, no transitive dependency is allowed.
EKNF
Elementary Key Normal Form
C. Zaniolo
1982[12]
Every non-trivial functional dependency in the table is either the dependency of an elementary key attribute or a dependency on a superkey
BCNF
Boyce–Codd normal form
Raymond F. Boyce and
E.F. Codd
1974[13]
Every non-trivial functional dependency in the table is a dependency on a
superkey
4NF
Fourth normal form
Ronald Fagin
1977[14]
Every non-trivial
multivalued dependency in the table is a dependency on a superkey
5NF
Fifth normal form
Ronald Fagin
1979[15]
Every non-trivial
join dependency in the table is implied by the superkeys of the table
DKNF
Domain/key normal form
Ronald Fagin
1981[16]
Every constraint on the table is a
logical consequence of the table's domain constraints and key constraints
6NF
Sixth normal form
C.J. Date,
Hugh Darwen, and
Nikos Lorentzos
2002[17]
Table features no non-trivial join dependencies at all (with reference to generalized join operator)

Problem inserting text with special Hungarian characters into MySQL database

When I insert text into my MySQL db the special Hungarian
characters (ő,ű) they change into "?".
When I check the
<cfoutput>#FORM.special_character#</cfoutput> it gives
me the correct text, things go wrong just when writing it into the
db. My hosting provider said the following: "please try to
evidently specify "latin2" charset with "latin2_hungarian_ci"
collation when performing any operations with tables. It is
supported by the server but not used by default." At my former
hosting provider I had no such problem. Anyway how could I do what
my hosting provider has suggested. I read a PHP related article
that said use "SET NAMES latin2". How could I do such thing in
ColdFusion? Any suggestion? Besides I've tried to use UTF8 and
Latin2 character encoding both on my pages and in the db but with
not much success.
I've also read a French language message here in this forum
that suggested to use:
<cfscript>
setEncoding("form", "utf-8");
setEncoding("url", "utf-8");
</cfscript>
<cfcontent type="text/html; charset=utf-8">
I' ve changed the utf-8 to latin2 and even to iso-8859-2 but
didn't help.
Thanks, Aron

I read that it would be the most straightforward way to do
everything in UTF-8 because it handles well special characters so
I've tried to set up a simple testing environment. Besides I use CF
MX7 and my hosting provider creates the dsn for me so I think the
db driver is JDBC but not sure.
1.) In Dreamweaver I created a page with UTF-8 encoding set
the Unicode Normalization Form to "C" and checked the include
unicode signature (BOM) checkbox. This created a page with the meta
tag: <meta http-equiv="Content-Type" content="text/html;
charset=utf-8" />. I've checked the HTTP header with an online
utility at delorie.com and it gave me the following info:
HTTP/1.1, Content-Type: text/html; charset=utf-8, Server:
Microsoft-IIS/6.0
2.) Then I put the following codes into the top of my page
before everything:
<cfprocessingdirective pageEncoding = "utf-8">
<cfset setEncoding("URL", "utf-8")>
<cfset setEncoding("FORM", "utf-8")>
<cfcontent type="text/html; charset=utf-8">
3.) I wrote some special Hungarian chars
(őű) into the page and they displayed
well all the time.
4.) I've created a simple MySQL db (MySQL Community Edition
5.0.27-community-nt) on my shared hosting server with phpMyAdmin
with default charset of UTF-8 and choosing utf8_hungarian_ci as
default collation. Then I creted a MyISAM table and the collation
was automatically applied to my varchar field into wich I stored
data with special chars. I've checked the properties of the MySQL
server in MySQL-Front prog and found the following settings under
the Variables tab: character_set_client: utf8,
character_set_connection: utf8, character_set_database: latin1,
character_set_results: utf8, character_set_server: latin1,
character_set_system: utf8, collation_connection: utf8_general_ci,
collation_database: latin1_swedish_ci, collation_server:
latin1_swedish_ci.
5.) I wrote a simple insert form into my page and tried it
using both the content of the form field and a hardcoded string
value and even tried to read back the value of the
#FORM.special_char# variable. In each cases the special Hungarian
chars changed to "q" or "p" letters.
Can anybody see something wrong in the above mentioned or
have an idea to test something else?
I am thinking about to try this same page against a db on my
other hosting providers MySQL server.
Here is the to the form:
http://209.85.117.174/pages/proba/chartest/utf8_1/form.cfm
Thanks, Aron

Problem inserting special Hungarian characters into db

Hi,
I've posted this question in the database connection forum
but put it here too because I don't know where to fit better.
I read that it would be the most straightforward way to do
everything in UTF-8 because it handles well special characters so
I've tried to set up a simple testing environment. Besides I use CF
MX7 and my hosting provider creates the dsn for me so I think the
db driver is JDBC but not sure.
1.) In Dreamweaver I created a page with UTF-8 encoding set
the Unicode Normalization Form to "C" and checked the include
unicode signature (BOM) checkbox. This created a page with the meta
tag: <meta http-equiv="Content-Type" content="text/html;
charset=utf-8" />. I've checked the HTTP header with an online
utility at delorie.com and it gave me the following info:
HTTP/1.1, Content-Type: text/html; charset=utf-8, Server:
Microsoft-IIS/6.0
2.) Then I put the following codes into the top of my page
before everything:
<cfprocessingdirective pageEncoding = "utf-8">
<cfset setEncoding("URL", "utf-8")>
<cfset setEncoding("FORM", "utf-8")>
<cfcontent type="text/html; charset=utf-8">
3.) I wrote some special Hungarian chars
(őű) into the page and they displayed
well all the time.
4.) I've created a simple MySQL db (MySQL Community Edition
5.0.27-community-nt) on my shared hosting server with phpMyAdmin
with default charset of UTF-8 and choosing utf8_hungarian_ci as
default collation. Then I creted a MyISAM table and the collation
was automatically applied to my varchar field into wich I stored
data with special chars. I've checked the properties of the MySQL
server in MySQL-Front prog and found the following settings under
the Variables tab: character_set_client: utf8,
character_set_connection: utf8, character_set_database: latin1,
character_set_results: utf8, character_set_server: latin1,
character_set_system: utf8, collation_connection: utf8_general_ci,
collation_database: latin1_swedish_ci, collation_server:
latin1_swedish_ci.
5.) I wrote a simple insert form into my page and tried it
using both the content of the form field and a hardcoded string
value and even tried to read back the value of the
#FORM.special_char# variable. In each cases the special Hungarian
chars changed to "q" or "p" letters.
Can anybody see something wrong in the above mentioned or
have an idea to test something else?
I am thinking about to try this same page against a db on my
other hosting providers MySQL server.
Here is the to the form:
http://209.85.117.174/pages/proba/chartest/utf8_1/form.cfm
Thanks, Aron

Some new info about the advancements in my project:
I've tried to make the insertion at a third hosting
provider's MySQL server with my 'everything is UTF-8' test case and
IT'S DONE! There are my lovely spec chars :-)
Then I've checked the char encoding according -Per's tip in
all of my so far used test MySQL dbs and it reported that
'CHARSET=utf8 COLLATE=utf8_hungarian_ci' so this part seems to me
OK.
I asked my hosting provider where my production app should
run about the db driver and they told it's JDBC (what version of
Jconnect still donno') and they are ready to append
&characterSetResults=UTF-8 to the JDBC url connection string
(somebody told this tip also as a possible soultion) but they asked
me to provide the complete connection string to be used for my
datasource. I've tried to compose it in my localhost development
environment in ColdFusion Admin but it gave me a Connection
verification failed error. So I think I did something wrong and
need help to write the correct connection string that can be passed
to the hosting provider. So the connection string structure I tried
to use in the JDBC URL field of the datasource area of CFAdmin is
something like this:
jdbc:mysql://someipaddresshere/mydbname&characterSetResults=UTF-8
How can it be corrected?
Thanks, Aron

I am confused in setting up the basic stuff.

Hi
Under: "Edit' "Preferences" "New Document"
I set the:
Default Document: HTML
Default Extension: .html
Default Document Type (DTD): HTML 4.01 Transitional
Default Encoding: Unicode (UTF-8)
I check the: Use when opening existing files that don't specify an encoding
Unicode Normalization Form: Gives me 4 choices
C (Canonical Decomposition, followed by Conical Composition)
D (Canonical Decomposition)
KC (Compatibility Decomposition, followed by Conical Composition
KD (Compatibility Decomposition)
Not sure of which one to choose? Need help here...
I Check: Include Unicode Signature (BOM)
I am also getting files that look like: ../../../filename.html
How do I correct this?
Help appreciated
Bob

I Check: Include Unicode Signature (BOM)
There's no reason to use BOM Unicode Signature in plain HTML documents.
It's actually better if you don't specify this.
I am also getting files that look like: ../../../filename.html
Define your site. Go to Site > New Site or Manage Sites > tell DW where your local site files are located on your hard drive.
Under Advanced Settings > Local Info, use Links Relative to Document. Save your site definition settings.
Now when you create a new document & save it, DW will manage your file assets and paths relative to the document and not the site root folder.
Nancy O.

How to embed unicode fonts in fillable form.

Hello friends,
I had created a normal unicode Pdf form. IT shows perfectly.
But when i make this form as a fillable pdf the unicode charaters converts into junk and the form is become unreadable.
Attached here a normal unicode pdf form. As well as form_fillable.pdf which is junked after created a fillable
regards
KD

Check this forum post (though that is for 6i, it would be of helpful for you)
How to use unicode fonts in Oracle forms 10g?
-Arun

Wrong unicode characters when exporting?

Hello guys!
I exported my iTunes library to an unicode xml file ro use it with TwonkyVision-server which feeds my Roku Soundbrigde. The music-files are located on a NAS station, Twonky is also running on that NAS. All of these hard- and software support unicode!
In Twonky I can link to the exported xml-library file which works great -- my Roku can see and play the files. But only those which don't have these strange European characters in it like ä, ö, ü, é, è, à and so on!
In the beginning I changed some of the filename and tags and reimported the files, but the amount of files grows rapidly so this is no option anymore. After some long nights fiddling with this topic I found out that iTunes (for Mac, didn't tested it for the Windows version) exported the library with a wrong unicode translation! iTunes does the following for an ü: u%CC%88 but the correct unicode is %C3%BC for example. When I changed all the occurrences in the xml-file and rebuild the internal Twonky database every missing track shows up.
To cut a long story short -- is this a bug in iTunes? Can some other confirm this? Or at least Apple? Does somebody know a solution for this because I have to do the find and replace action every time I export my library again which is also annoying ... (but at least it works).
Thanx in advance.
-- Hudson

I
found out that iTunes (for Mac, didn't tested it for
the Windows version) exported the library with a
wrong unicode translation! iTunes does the following
for an ü: u%CC%88 but the correct unicode is %C3%BC
for example.
u%CC%88 is not totally wrong. It represents the decomposed form of ü, that is u plus combining diaereses. I think the OS X file system always uses this, called Normalization Form D. I gather you need it in the composed version, Normalization Form C. I don't know whether iTunes' export function's failure to change the Normalization Form is a bug or not, but I think search/replace might be the only way to do what you require, unless there's a "normalization converter program" available. For the latter, have a look at UnicodeChecker:
http://earthlingsoft.net/UnicodeChecker/
After installation this provides a Unicode services item which includes several conversion possibilities.

Combining Diacritical Marks - UNICODE

Hi folks,
Searched the forums and found nada. I want to convert / compose / transform unicode chars
from their decomposed form to the composed form. From Normalizer API:
>
Characters with accents or other adornments can be encoded in several different ways in Unicode. For example, take the character A-acute. In Unicode, this can be encoded as a single character (the "composed" form):
U+00C1 LATIN CAPITAL LETTER A WITH ACUTE
or as two separate characters (the "decomposed" form):
U+0041 LATIN CAPITAL LETTER A
U+0301 COMBINING ACUTE ACCENT
>
In my case, I have strings containing utf-8 hex values, e.g., á: 0x61 0xcc 0x81 / U+0097 U+0301, and
I want to convert them to the composed form: 0xe1 / U+0225.
Does anyone know an API to convert from the decomposed form to the composed form? I already tried the Normalizer class, using Normalizer.Form.NFC and Normalizer.Form.NFKC, but it didn't work.
I'm looking into ICU4J now, but up to now it didn't help. Maybe someone know how to use it properly.
Thanks in advance.

First of all, thanks for answering Dr. Clap. As I was writing this response I managed to make it work using
ICU4J. But, for the sake of the discussion, and to aggregate information to this post (as there's not much
posts about UTF-8 combining diacritical marks in these forums), I'll answer your questions below.
DrClap wrote:
Danniel_Willian wrote:
In my case, I have strings containing utf-8 hex values, e.g., á: 0x61 0xcc 0x81 / U+0097 U+0301, and
I want to convert them to the composed form: 0xe1 / U+0225.I didn't understand this part. Do you mean your string contains the character '0' then the character 'x' then the character '6' then the character '1' then a space, and so on? Or does it contain the byte x61 in the first character and the byte xCC in the second character, and so on? Or what?Sorry if I didn't make myself clear, and reading it again I've seen that it really isn't clear. Actually I've got a
byte stream, let's say from a file, and that byte stream contains those bytes, in that order:
0x61 0xcc 0x81.
These are bytes that represent in UTF-8 the following chars:
0x61 - lower case "A"
0xcc 0x81 - accute sign "combining diacritical mark"
In any case your best bet is to decode the UTF-8 bytes -- if that's what you have in some way -- into Unicode characters, and then apply those normalizing methods you mentioned.Shouldn't this be transparent? I mean, if I have a stream of valid UTF-8 bytes, shouldn't this kind of "decoding" be seamless when I create a String from it?
Best regards.
Edited by: Danniel_Willian on May 27, 2009 6:16 PM

Unicode convertion for Czech Language

Hi all,
my system is not unicode and I have to conver it into a unicode one because we are going to plan a roll-out project for our czech branch. We have an ECC5.0 with English, German, Italian and Spanish languages.
I have understood, more or less, what we have to do in a technical way, but I would like to know something more about the growing of hardware space needs.
how much will my system grow reasonably after uniocde conversion and czech language installation? 10%, 20%?
Thanks a lot.

Hi,
Have a look at the link below which will help to answer the question.
https://www.sdn.sap.com/irj/sdn/go/portal/prtroot/docs/library/uuid/10317ed9-1c11-2a10-9693-ec0d9a3bc537
https://www.sdn.sap.com/irj/sdn/go/portal/prtroot/docs/library/uuid/589d18d9-0b01-0010-ac8a-8a22852061a2
If the Unicode encoding form is UTF -8 database size growth will be 10% of its original size
If it is UTF 16 then it may grow up to 60 to 70% of the original size
Rgds
Radhakrishna D S

Converting Unicode to plain text

Hi,
Is there anyway i can get the string "Internationalization" from I�t�rn�ti�n�liz�ti�n?

You could decompose it using one of the Unicode normalizations:
http://java.sun.com/javase/6/docs/api/java/text/Normalizer.html
and then go through the result and remove all of the combining characters.

Normalize UTF-8 character

I have to do String comparision between English String and similar UTF-* string:
Example: Nacori and Nácori, Rayon and Rayón
Right now my String comparision return failure for such cases.
Is there any Java library/Class then can help to mark such cases with similar name as pass.? OR converting Rayón to similar english word like Rayon?

Got the solution which I was looking for with the help of the diacritical keywork. Here is the sample code
public static String deAccent(String str)
          String nfdNormalizedString = Normalizer.normalize(str, Normalizer.Form.NFD);
          Pattern pattern = Pattern.compile("\\p{InCombiningDiacriticalMarks}+");
          return pattern.matcher(nfdNormalizedString).replaceAll("");
     }

Script help.. find folders (title oversimplified..)

What I need:
Have list of folder names in Excel
Script uses list to find specified folders. Folders (items in list) are actually three folders deep in directory (external archive disk/dated folder/folder I want).
Script then copies folders to another location.
Basically, I have a main archive directory (Archive1). On that disk are dated folders (102013). Inside the dated folders there are anywhere from 10-20 folders.. those contain the data (study data).. and those are the folders I want. I will have a list of 50-100 data folders provided to me.. all from random dates. I dont want to manually find, then drag and drop data into new archive. I have a list.. and would like to automate this.
What I have:
Script that will find FILES.. not folders... and not in subdirectories. It does everything i need.. if I was only looking for files in one folder.
Script I have is below. If one of you kind souls would point me in the right direction, id appreciate it.
If you want to test the script below:
Make a short list in excel.. maybe just a, b, c in a column.
Create two (named mine archive and transfer) folders on your desktop.. Place three dummy files named a, b, and c in the archive folder. Run the script wile the spreadsheet is open. You will be asked to choose the archive folder. Then the transfer folder. All files from archive folder will then be copied to transfer folder. The spreadsheet will be updated to include filenames with extension in the B colum.
Script:
set studiestofind to (choose folder with prompt "Choose Archive Location")
set transferto to (choose folder with prompt "Choose the Transfer Location")
tell application "Microsoft Excel"
 tell active sheet
 set lastIndex to first row index of last cell of (get used range)
 repeat with i from 2 to lastIndex
 set tFileName to my find_filenames(get value of range ("A" & i), studiestofind)
 if tFileName is not "" then -- Study Found
 set r to my duplicateImage(tFileName, studiestofind, transferto)
 if r is not "" then -- no error on duplicate file
 set value of range ("B" & i) to tFileName
 else
 set value of range ("B" & i) to ""
 end if
 end if
 end repeat
 end tell
end tell
on duplicateImage(tName, iFolder, mFolder)
 try
 tell application "Finder" to return (duplicate file tName of iFolder to mFolder) as string
 on error
 return ""
 end try
end duplicateImage
on find_filenames(tVal, thefolder)
--if cell begins with space or ends with space , text items of tVal return empty string, empty string added in list excludedWords
 set excludedWords to {"Mini", "and", ""} -- words in Menu Item to exclude in search
 set otid to text item delimiters
 set text item delimiters to " "
 set tWords to text items of tVal
 set text item delimiters to otid
 set thefolder to quoted form of POSIX path of thefolder
 set myListOfWords to {}
 repeat with i in tWords -- must be valid
 if contents of i is not in excludedWords then set end of myListOfWords to contents of i
 end repeat
 if (count myListOfWords) = 1 then -- more filler i dont really understand
 set i to quoted form of (tVal & ".")
-- search file name
 set tpath to do shell script "cd " & thefolder & " && /usr/bin/find . -type f -maxdepth 1 -iname " & i & "*"
 if tpath is not "" then return text 3 thru -1 of tpath --return name of exact match
 end if
 repeat with i in myListOfWords
 set tPaths to do shell script "cd " & thefolder & " && /usr/bin/find . -type f -maxdepth 1 -iregex '.*[_/]" & i & "[_.].*' \\! -name '.*' "
 if tPaths is not "" then
 set L to paragraphs of tPaths
 if (count L) = 1 then return text 3 thru -1 of tPaths -- one path found
 repeat with tpath in L -- many paths found, loop each path
 set isGood to false
 repeat with tword in myListOfWords -- check each word of this Menu Item in tpath
 if ("/" & tword & "_") is in tpath or ("/" & tword & ".") is in tpath or ("_" & tword & ".") is in tpath or ("_" & tword & "_") is in tpath then
 set isGood to true
 else
 set isGood to false
 exit repeat
 end if
 end repeat
 if isGood then return text 3 thru -1 of tpath -- each word of this Menu Item is in name of tpath
 end repeat
 end if
 end repeat
 return ""
end find_filenames

Hello
It is not hard to modify the existing script you posted so as to retrieve directories at depth 2 but I chose to rewrite it as follows because its filtering logic is very inefficient. In the current form, the script below will retrieve directories at depth 2 in the chosen folder.
The part scripting Excel is not tested, for I don't have Excel. It might fail to set the value of cell in column B when there are multiple matches in which case the script will try to set cell value to multi-line text of every found path. If it fails, we can adjust the code to work aronud it. Your original script only processes the first matched file but I thought it's better to process every matched one.
# Notes
• Filtering logic is modified so that it is now wholly processed by find(1). Also the regexp pattern is modified so that it compares the query word with string delimited by _ and . in file name.
E.g., Given query = "apple orange bananna", the original script matches these -
 apple_orange_bananna.tex
 apple_bananna_orange.dvi
 _bananna__orange__apple.ps
 _bananna._apple._orange.pdf
 _orange.blueberry_bananna_apple....djvu
but not these -
 apple.orange.bananna.txt
 apple_.orange_.bananna.txt
 apple_orange_blueberry.html
 apple_orange_bananna
The new script will match all of them except for "apple_orange_blueberry.html".
• I added code to handle NFD-NFC issue of HFS+ name. HFS+ name is represented as Unicode in NFD (Normalization Form D), while name in Excel could be in NFC (Normalization Form C), in which case find(1) will not match the name whose NFD and NFC are different. This is not a issue if name is only in a-zA-Z0-9. But, e.g., any diacriticals will cause trouble if not with special treatment.
• Script will write log to file on desktop when a) some query gives no result or b) some of the matched files/folders are not copied because item with the same name already exists in the destination.
• Script will behave (mostly) the same as the original script when you set the find parameters properties in _main() as -
 -- find parameters
 property type : "f" -- d = directory, f = file
 property mindepth : 1
 property maxdepth : 1
• Currently the value in column B will be the (list of) found and copied path(s) and not the found path(s). If you need all of the found path(s) to be put in column B, switch the following statements in _main():
 set value of range ("B" & i) to my _join(return, qq) -- found and copied
 --set value of range ("B" & i) to my _join(return, pp) -- found
• Script is tested under OSX 10.5.8. (except for Excel scripting)
# Script
_main()
on _main()
 script o
 -- directories and logfile
 property srcdir : (choose folder with prompt "Choose Archive Location")'s POSIX path
 property dstdir : (choose folder with prompt "Choose the Transfer Location")'s POSIX path
 property logf : (path to desktop)'s POSIX path & "copy_log@" & (do shell script "date +'%F.txt'")
 -- find parameters
 property type : "d" -- d = directory, f = file
 property mindepth : 2
 property maxdepth : 2
 -- working lists
 property pp : {}
 property qq : {}
 property rr : {}
 -- find & copy nodes
 tell application "Microsoft Excel"
 tell active sheet
 set lastIndex to first row index of last cell of (get used range)
 repeat with i from 2 to lastIndex
 set query to value of range ("A" & i)
 set pp to my find_filenames(query, srcdir, type, mindepth, maxdepth)
 if pp = {} then -- not found
 tell current application to set ts to do shell script "date +'%F %T%z'"
 set entry to "Did not found a name with query word(s): " & query
 my log_to_file("%-26s%s\\n", {ts, entry}, logf)
 else
 set qq to {}
 repeat with p in my pp
 set p to p's contents
 set q to my cp(srcdir & p, dstdir, {replacing:false})
 if q ≠ "" then
 set end of my qq to p -- found and copied
 else
 set end of my rr to srcdir & p -- found but not copied
 end if
 end repeat
 set value of range ("B" & i) to my _join(return, qq) -- found and copied
 --set value of range ("B" & i) to my _join(return, pp) -- found
 end if
 end repeat
 end tell
 end tell
 -- log duplicates
 if rr ≠ {} then
 set ts to do shell script "date +'%F %T%z'"
 set entry to "Did not copy the following node(s) due to existing name in destination: " & dstdir
 my log_to_file("%-26s%s\\n", {ts, entry}, logf)
 repeat with r in my rr
 my log_to_file("%-28s%s\\n", {"", r's contents}, logf)
 end repeat
 end if
 end script
 tell o to run
end main
on log_to_file(fmt, lst, f)
 string fmt : printf format string
 list lst : list of strings
 string f : POSIX path of log file
 local args
 set args to ""
 repeat with a in {fmt} & lst
 set args to args & space & a's quoted form
 end repeat
 do shell script "printf " & args & " >> " & f's quoted form
end log_to_file
on cp(src, dstdir, {replacing:_replacing})
 string src : POSIX path of source file or directory
 string dstdir : POSIX path of destination directory
 boolean _replacing : true to replace existing destination, false otherwise
 return string : POSIX path of copied file or directory, or "" if not copied
 set rep to _replacing = true
 if src ends with "/" then set src to src's text 1 thru -2
 if dstdir ends with "/" then set dstdir to dstdir's text 1 thru -2
 set sh to "
rep=$1
src=\"$2\"
dstdir=\"$3\"
dst=\"${dstdir}/${src##*/}\"
[[ $rep == false && -e \"$dst\" ]] && exit
cp -f -pPR \"$src\" \"$dstdir\" && echo \"$dst\" || exit $?
 do shell script "/bin/bash -c " & sh's quoted form & " - " & rep & " " & src's quoted form & " " & dstdir's quoted form
end cp
on find_filenames(query, dir, type, mind, maxd)
 string query : space delimited words to search
 string dir : POSIX path of target root directory
 string type : node type to retrieve
 f => file
 d => directory
 integer mind : min depth of nodes in dir to retrieve
 integer maxd : max depth of nodes in dir to retrieve
 return list : list of POSIX path of found node(s)
 script o
 property exclude_list : {"Mini", "and", ""} -- list of words to be excluded from query
 property pp : _split(space, NFD(query))
 property qq : {}
 property rr : {}
 -- exclude words in exclude_list from query words
 repeat with p in my pp
 set p to p's contents
 if p is not in my exclude_list then set end of my qq to p
 end repeat
 -- build arguments for find(1)
 (* query words are compared with tokens delimited by _ or . in file name *)
 repeat with q in my qq
 e.g. Given qq = {"apple", "orange", "bananna"}, rr is list of -
 -iregex '.*[/_.]apple([_.].*|$)'
 -iregex '.*[/_.]orange([_.].*|$)'
 -iregex '.*[/_.]bananna([_.].*|$)'
 ( Note: |'s must be balanced even in comment... )
 set q to quotemeta(q) -- quote non-alphanumeric characters to be recognised as literal in regexp
 set end of rr to "-iregex '.*[/_.]" & q & "([_.].*|$)'"
 end repeat
 set |-iregex| to " \$ " & _join(" -and ", my rr) & " \$"
 set |-type| to " -type " & type
 set |-mindepth| to " -mindepth " & mind
 set |-maxdepth| to " -maxdepth " & maxd
 -- build shell script
 set sh to "export LC_ALL=en_GB.UTF-8
cd " & dir's quoted form & " || exit
/usr/bin/find -E . " & |-type| & |-mindepth| & |-maxdepth| & |-iregex| & " \\! -name '.*' -print0 |
/usr/bin/perl -CS -ln0e 'print substr($_, 2);'
 -- run shell script
 return paragraphs of (do shell script sh)
 end script
 tell o to run
end find_filenames
on NFD(t)
 string t : source string
 return Unicode text : t in NFD (Normalization Form D)
 set pl to "/usr/bin/perl -CSDA -MUnicode::Normalize <<-'EOF' - \"$*\"
print $ARGV[0] ? NFD($ARGV[0]) : qq();
EOF"
 do shell script "/bin/bash -c " & pl's quoted form & " - " & t's quoted form
end NFD
on quotemeta(t)
 string t : source string
 return string : t where all non-alphanumeric characters are quoted by backslash
 set pl to "/usr/bin/perl -CSDA -e 'print quotemeta $ARGV[0];' \"$*\""
 do shell script "/bin/bash -c " & pl's quoted form & " - " & t's quoted form
end quotemeta
on _join(d, tt)
 string d : separator
 list tt : source list
 return string : tt joined with d
 local astid, astid0, t
 set astid to a reference to AppleScript's text item delimiters
 try
 set {astid0, astid's contents} to {astid's contents, {} & d}
 set t to "" & tt
 set astid's contents to astid0
 on error errs number errn
 set astid's contents to astid0
 error errs number errn
 end try
 return t
end _join
on _split(d, t)
 string or list d : separator(s)
 string t : source string
 return list : t splitted by d
 local astid, astid0, tt
 set astid to a reference to AppleScript's text item delimiters
 try
 set {astid0, astid's contents} to {astid's contents, {} & d}
 set tt to t's text items
 set astid's contents to astid0
 on error errs number errn
 set astid's contents to astid0
 error errs number errn
 end try
 return tt
end _split
Hope this may help
H
Message was edited by: Hiroto (fixed some typos)

Unicode Normalization Form??

Similar Messages

Maybe you are looking for