Unicode Normalization Form??
Hi Folks,
To stay consistent with my Apache host, I've chosen UTF-8 as
my default page
encoding under "New Document" in DW 8 Preferences.
Dreamweaver "Help" implies that you need to specify a
"Unicode Normalization
Form" if you select Unicode (UTF-8). It suggests using
"...Normalization
Form C because it's the most common one used in the Character
Model for the
World Wide Web. Macromedia provides the other three Unicode
Normalization
Forms for completeness."
Is specifying a normalization form absolutely necessary? If
so, is Type C
the proper choice?
Any advice appreciated. Thanks.
Phil
Lost in the World of Encoding
Normalization type does not apply to UTF-8, so it doesn't
matter. UTF-16
and wider it does. In any case, no harm in specifying type C
anyway.
"Phil Papeman" <plpapeman@remove_comcast.net> wrote in
message
news:eklf99$8ps$[email protected]..
> Dreamweaver "Help" implies that you need to specify a
"Unicode
> Normalization Form" if you select Unicode (UTF-8). It
suggests using
> "...Normalization Form C because it's the most common
one used in the
> Character Model for the World Wide Web. Macromedia
provides the other
> three Unicode Normalization Forms for completeness."
>
Similar Messages
-
Can anyone guide through steps to use unicode in forms 10g.
i have set computer environment variable NLS_LANG.
and settings in default.env and nls_lang in regedit all to 'AMERICAN_AMERICA.UTF8 'Thanks Billy.
Yes you are right.
Here why I discarded that option is,
I may get the source files with changing layouts.
My Actual scenario is as follows.
Initially we developped all the things using PL/SQL packages. Its working fine.
But as per the inputs we received from requirements group, the file structure changes dynamically. and we would able to consider those new columns also. We should be able to changes the rules dynamically.
Lets say, we doing fullouter join on Src_A and Src_B. on columns col1_A and col1_B.
Now the requirement changes in a way that, the join should be done on Src_A and Src_C. on columns col1_A and col_C.
For this I need to define a new package.
Instead of that, I would like to do everything dynamically based on configuration parameters given as input.
Thank you,
Regards,
Gowtham Sen -
Dear all,
How will be the impact in a Form in 6i if the database is migrated to a unicode database? Is enough to change the NLS_LANG to display Unicode fonts or should I to recompile the Form or change de char or varchar fields?
Thanks a lot !!
Best Regards.
Sara GarciaYes Forms 6i supports running in Unicode (UTF8) when Web Deployed or in client server on Unicode compliant clients - Windows NT & 2000.
Your database will of course have to be running in Unicode as well in order to store the full range of characters.
To enable unicode in Forms simply set the characterset portion of NLS_LANG environment variable to UTF8. -
Unicode normalisation form C and Apple Safari
Technically, the World Wide Web Consortium specifies normalisation form C. This suggests that Apple Safari 3.2.1 should not establish equivalences (aliases, synonyms) to other coded characters if a coded character has no specified equivalences as per normalisation form C. Nonetheless, Apple Safari 3.2.1 does seem to establish equivalences.
Anyone have any thoughts?
/hh
Reference:
http://unicode.org/faq/normalization.html#6According to Asmus Freytag, the character LATIN SMALL LETTER LONG S was encoded because it involves a semantic distinction as opposed to a stylistic distinction in that writing system
It does have a compatibilty decomposition to "s" , however.
If the World Wide Web Consortium specifies Form C, if no equivalences are established for LATIN SMALL LETTER LONG S under Form C, and if HTML is opened from disk in a browser, then LATIN SMALL LETTER S should not be successful as a synonym in a search string, it seems. Thus, Faust should not successfully retrieve Fauſt
I guess that would be correct under String Identity Matching as defined here?
http://www.w3.org/TR/charmod-norm/#sec-IdentityMatching
It seems true that Edit > Find uses a less restrictive matching system, but it's not clear to me that doing that is contrary to the standards in some way.
It's not just Safari but all apps where you can find ſ by looking for s, right?
and the Related Characters pane in the Character Palette should not show that LATIN SMALL LETTER S and LATIN SMALL LETTER LONG S are unconditional synonyms. Rather, they are conditional synonyms.
I'm somewhat mystified as to exactly what the Related Charcters pane is supposed to show, other than characters that look similar. I wonder how Apple chooses them? In any case I would hope the compatiblity decomposition of a character would appear there.
It seems to me that search services can compete by seeming to be more successful, and to seem more successful search services can establish equivalences which are broader than the equivalences that an author is entitled to expect based on specifications and standards.
I think the w3c standards about this are mainly related to the form a text should have on the web, rather than what results search services should return (unless the services perhaps specify they are doing a "string identity match"). -
how many normal forms we have total in SQL server .
i just know 3 normal formsFrom:
http://en.wikipedia.org/wiki/Database_normalization
The main normal forms are summarized below.
Normal form
Defined by
In
Brief definition
1NF
First normal form
Two versions:
E.F. Codd (1970),
C.J. Date (2003)
1970<sup class="reference" id="cite_ref-Codd1970_1-1">[1]</sup> and 2003<sup class="reference"
id="cite_ref-10">[10]</sup>
The domain of each
attribute contains only
atomic values, and the value of each attribute contains only a single value from that domain.<sup class="reference" id="cite_ref-11">[11]</sup>
2NF
Second normal form
E.F. Codd
1971<sup class="reference" id="cite_ref-Codd.2C_E.F_1971_2-1">[2]</sup>
No non-prime attribute in the table is
functionally dependent on a
proper subset of any
candidate key
3NF
Third normal form
Two versions:
E.F. Codd (1971), C. Zaniolo (1982)
1971<sup class="reference" id="cite_ref-Codd.2C_E.F_1971_2-2">[2]</sup> and 1982<sup class="reference"
id="cite_ref-Zaniolo.2C_Carlo_1982_12-0">[12]</sup>
Every non-prime attribute is non-transitively dependent on every candidate key in the table. The attributes that do not contribute to the description of the primary key are removed from the table. In other words, no transitive dependency is allowed.
EKNF
Elementary Key Normal Form
C. Zaniolo
1982<sup class="reference" id="cite_ref-Zaniolo.2C_Carlo_1982_12-1">[12]</sup>
Every non-trivial functional dependency in the table is either the dependency of an elementary key attribute or a dependency on a superkey
BCNF
Boyce–Codd normal form
Raymond F. Boyce and
E.F. Codd
1974<sup class="reference" id="cite_ref-13">[13]</sup>
Every non-trivial functional dependency in the table is a dependency on a
superkey
4NF
Fourth normal form
Ronald Fagin
1977<sup class="reference" id="cite_ref-14">[14]</sup>
Every non-trivial
multivalued dependency in the table is a dependency on a superkey
5NF
Fifth normal form
Ronald Fagin
1979<sup class="reference" id="cite_ref-15">[15]</sup>
Every non-trivial
join dependency in the table is implied by the superkeys of the table
DKNF
Domain/key normal form
Ronald Fagin
1981<sup class="reference" id="cite_ref-16">[16]</sup>
Every constraint on the table is a
logical consequence of the table's domain constraints and key constraints
6NF
Sixth normal form
C.J. Date,
Hugh Darwen, and
Nikos Lorentzos
2002<sup class="reference" id="cite_ref-Date6NF_17-0">[17]</sup>
Table features no non-trivial join dependencies at all (with reference to generalized join operator) -
Problem inserting text with special Hungarian characters into MySQL database
When I insert text into my MySQL db the special Hungarian
characters (ő,ű) they change into "?".
When I check the
<cfoutput>#FORM.special_character#</cfoutput> it gives
me the correct text, things go wrong just when writing it into the
db. My hosting provider said the following: "please try to
evidently specify "latin2" charset with "latin2_hungarian_ci"
collation when performing any operations with tables. It is
supported by the server but not used by default." At my former
hosting provider I had no such problem. Anyway how could I do what
my hosting provider has suggested. I read a PHP related article
that said use "SET NAMES latin2". How could I do such thing in
ColdFusion? Any suggestion? Besides I've tried to use UTF8 and
Latin2 character encoding both on my pages and in the db but with
not much success.
I've also read a French language message here in this forum
that suggested to use:
<cfscript>
setEncoding("form", "utf-8");
setEncoding("url", "utf-8");
</cfscript>
<cfcontent type="text/html; charset=utf-8">
I' ve changed the utf-8 to latin2 and even to iso-8859-2 but
didn't help.
Thanks, AronI read that it would be the most straightforward way to do
everything in UTF-8 because it handles well special characters so
I've tried to set up a simple testing environment. Besides I use CF
MX7 and my hosting provider creates the dsn for me so I think the
db driver is JDBC but not sure.
1.) In Dreamweaver I created a page with UTF-8 encoding set
the Unicode Normalization Form to "C" and checked the include
unicode signature (BOM) checkbox. This created a page with the meta
tag: <meta http-equiv="Content-Type" content="text/html;
charset=utf-8" />. I've checked the HTTP header with an online
utility at delorie.com and it gave me the following info:
HTTP/1.1, Content-Type: text/html; charset=utf-8, Server:
Microsoft-IIS/6.0
2.) Then I put the following codes into the top of my page
before everything:
<cfprocessingdirective pageEncoding = "utf-8">
<cfset setEncoding("URL", "utf-8")>
<cfset setEncoding("FORM", "utf-8")>
<cfcontent type="text/html; charset=utf-8">
3.) I wrote some special Hungarian chars
(<p>őű</p>) into the page and they displayed
well all the time.
4.) I've created a simple MySQL db (MySQL Community Edition
5.0.27-community-nt) on my shared hosting server with phpMyAdmin
with default charset of UTF-8 and choosing utf8_hungarian_ci as
default collation. Then I creted a MyISAM table and the collation
was automatically applied to my varchar field into wich I stored
data with special chars. I've checked the properties of the MySQL
server in MySQL-Front prog and found the following settings under
the Variables tab: character_set_client: utf8,
character_set_connection: utf8, character_set_database: latin1,
character_set_results: utf8, character_set_server: latin1,
character_set_system: utf8, collation_connection: utf8_general_ci,
collation_database: latin1_swedish_ci, collation_server:
latin1_swedish_ci.
5.) I wrote a simple insert form into my page and tried it
using both the content of the form field and a hardcoded string
value and even tried to read back the value of the
#FORM.special_char# variable. In each cases the special Hungarian
chars changed to "q" or "p" letters.
Can anybody see something wrong in the above mentioned or
have an idea to test something else?
I am thinking about to try this same page against a db on my
other hosting providers MySQL server.
Here is the to the form:
http://209.85.117.174/pages/proba/chartest/utf8_1/form.cfm
Thanks, Aron -
Problem inserting special Hungarian characters into db
Hi,
I've posted this question in the database connection forum
but put it here too because I don't know where to fit better.
I read that it would be the most straightforward way to do
everything in UTF-8 because it handles well special characters so
I've tried to set up a simple testing environment. Besides I use CF
MX7 and my hosting provider creates the dsn for me so I think the
db driver is JDBC but not sure.
1.) In Dreamweaver I created a page with UTF-8 encoding set
the Unicode Normalization Form to "C" and checked the include
unicode signature (BOM) checkbox. This created a page with the meta
tag: <meta http-equiv="Content-Type" content="text/html;
charset=utf-8" />. I've checked the HTTP header with an online
utility at delorie.com and it gave me the following info:
HTTP/1.1, Content-Type: text/html; charset=utf-8, Server:
Microsoft-IIS/6.0
2.) Then I put the following codes into the top of my page
before everything:
<cfprocessingdirective pageEncoding = "utf-8">
<cfset setEncoding("URL", "utf-8")>
<cfset setEncoding("FORM", "utf-8")>
<cfcontent type="text/html; charset=utf-8">
3.) I wrote some special Hungarian chars
(<p>őű</p>) into the page and they displayed
well all the time.
4.) I've created a simple MySQL db (MySQL Community Edition
5.0.27-community-nt) on my shared hosting server with phpMyAdmin
with default charset of UTF-8 and choosing utf8_hungarian_ci as
default collation. Then I creted a MyISAM table and the collation
was automatically applied to my varchar field into wich I stored
data with special chars. I've checked the properties of the MySQL
server in MySQL-Front prog and found the following settings under
the Variables tab: character_set_client: utf8,
character_set_connection: utf8, character_set_database: latin1,
character_set_results: utf8, character_set_server: latin1,
character_set_system: utf8, collation_connection: utf8_general_ci,
collation_database: latin1_swedish_ci, collation_server:
latin1_swedish_ci.
5.) I wrote a simple insert form into my page and tried it
using both the content of the form field and a hardcoded string
value and even tried to read back the value of the
#FORM.special_char# variable. In each cases the special Hungarian
chars changed to "q" or "p" letters.
Can anybody see something wrong in the above mentioned or
have an idea to test something else?
I am thinking about to try this same page against a db on my
other hosting providers MySQL server.
Here is the to the form:
http://209.85.117.174/pages/proba/chartest/utf8_1/form.cfm
Thanks, AronSome new info about the advancements in my project:
I've tried to make the insertion at a third hosting
provider's MySQL server with my 'everything is UTF-8' test case and
IT'S DONE! There are my lovely spec chars :-)
Then I've checked the char encoding according -Per's tip in
all of my so far used test MySQL dbs and it reported that
'CHARSET=utf8 COLLATE=utf8_hungarian_ci' so this part seems to me
OK.
I asked my hosting provider where my production app should
run about the db driver and they told it's JDBC (what version of
Jconnect still donno') and they are ready to append
&characterSetResults=UTF-8 to the JDBC url connection string
(somebody told this tip also as a possible soultion) but they asked
me to provide the complete connection string to be used for my
datasource. I've tried to compose it in my localhost development
environment in ColdFusion Admin but it gave me a Connection
verification failed error. So I think I did something wrong and
need help to write the correct connection string that can be passed
to the hosting provider. So the connection string structure I tried
to use in the JDBC URL field of the datasource area of CFAdmin is
something like this:
jdbc:mysql://someipaddresshere/mydbname&characterSetResults=UTF-8
How can it be corrected?
Thanks, Aron -
I am confused in setting up the basic stuff.
Hi
Under: "Edit' "Preferences" "New Document"
I set the:
Default Document: HTML
Default Extension: .html
Default Document Type (DTD): HTML 4.01 Transitional
Default Encoding: Unicode (UTF-8)
I check the: Use when opening existing files that don't specify an encoding
Unicode Normalization Form: Gives me 4 choices
C (Canonical Decomposition, followed by Conical Composition)
D (Canonical Decomposition)
KC (Compatibility Decomposition, followed by Conical Composition
KD (Compatibility Decomposition)
Not sure of which one to choose? Need help here...
I Check: Include Unicode Signature (BOM)
I am also getting files that look like: ../../../filename.html
How do I correct this?
Help appreciated
BobI Check: Include Unicode Signature (BOM)
There's no reason to use BOM Unicode Signature in plain HTML documents.
It's actually better if you don't specify this.
I am also getting files that look like: ../../../filename.html
Define your site. Go to Site > New Site or Manage Sites > tell DW where your local site files are located on your hard drive.
Under Advanced Settings > Local Info, use Links Relative to Document. Save your site definition settings.
Now when you create a new document & save it, DW will manage your file assets and paths relative to the document and not the site root folder.
Nancy O. -
How to embed unicode fonts in fillable form.
Hello friends,
I had created a normal unicode Pdf form. IT shows perfectly.
But when i make this form as a fillable pdf the unicode charaters converts into junk and the form is become unreadable.
Attached here a normal unicode pdf form. As well as form_fillable.pdf which is junked after created a fillable
regards
KDCheck this forum post (though that is for 6i, it would be of helpful for you)
How to use unicode fonts in Oracle forms 10g?
-Arun -
Wrong unicode characters when exporting?
Hello guys!
I exported my iTunes library to an unicode xml file ro use it with TwonkyVision-server which feeds my Roku Soundbrigde. The music-files are located on a NAS station, Twonky is also running on that NAS. All of these hard- and software support unicode!
In Twonky I can link to the exported xml-library file which works great -- my Roku can see and play the files. But only those which don't have these strange European characters in it like ä, ö, ü, é, è, à and so on!
In the beginning I changed some of the filename and tags and reimported the files, but the amount of files grows rapidly so this is no option anymore. After some long nights fiddling with this topic I found out that iTunes (for Mac, didn't tested it for the Windows version) exported the library with a wrong unicode translation! iTunes does the following for an ü: u%CC%88 but the correct unicode is %C3%BC for example. When I changed all the occurrences in the xml-file and rebuild the internal Twonky database every missing track shows up.
To cut a long story short -- is this a bug in iTunes? Can some other confirm this? Or at least Apple? Does somebody know a solution for this because I have to do the find and replace action every time I export my library again which is also annoying ... (but at least it works).
Thanx in advance.
-- HudsonI
found out that iTunes (for Mac, didn't tested it for
the Windows version) exported the library with a
wrong unicode translation! iTunes does the following
for an ü: u%CC%88 but the correct unicode is %C3%BC
for example.
u%CC%88 is not totally wrong. It represents the decomposed form of ü, that is u plus combining diaereses. I think the OS X file system always uses this, called Normalization Form D. I gather you need it in the composed version, Normalization Form C. I don't know whether iTunes' export function's failure to change the Normalization Form is a bug or not, but I think search/replace might be the only way to do what you require, unless there's a "normalization converter program" available. For the latter, have a look at UnicodeChecker:
http://earthlingsoft.net/UnicodeChecker/
After installation this provides a Unicode services item which includes several conversion possibilities. -
Combining Diacritical Marks - UNICODE
Hi folks,
Searched the forums and found nada. I want to convert / compose / transform unicode chars
from their decomposed form to the composed form. From Normalizer API:
>
Characters with accents or other adornments can be encoded in several different ways in Unicode. For example, take the character A-acute. In Unicode, this can be encoded as a single character (the "composed" form):
U+00C1 LATIN CAPITAL LETTER A WITH ACUTE
or as two separate characters (the "decomposed" form):
U+0041 LATIN CAPITAL LETTER A
U+0301 COMBINING ACUTE ACCENT
>
In my case, I have strings containing utf-8 hex values, e.g., á: 0x61 0xcc 0x81 / U+0097 U+0301, and
I want to convert them to the composed form: 0xe1 / U+0225.
Does anyone know an API to convert from the decomposed form to the composed form? I already tried the Normalizer class, using Normalizer.Form.NFC and Normalizer.Form.NFKC, but it didn't work.
I'm looking into ICU4J now, but up to now it didn't help. Maybe someone know how to use it properly.
Thanks in advance.First of all, thanks for answering Dr. Clap. As I was writing this response I managed to make it work using
ICU4J. But, for the sake of the discussion, and to aggregate information to this post (as there's not much
posts about UTF-8 combining diacritical marks in these forums), I'll answer your questions below.
DrClap wrote:
Danniel_Willian wrote:
In my case, I have strings containing utf-8 hex values, e.g., á: 0x61 0xcc 0x81 / U+0097 U+0301, and
I want to convert them to the composed form: 0xe1 / U+0225.I didn't understand this part. Do you mean your string contains the character '0' then the character 'x' then the character '6' then the character '1' then a space, and so on? Or does it contain the byte x61 in the first character and the byte xCC in the second character, and so on? Or what?Sorry if I didn't make myself clear, and reading it again I've seen that it really isn't clear. Actually I've got a
byte stream, let's say from a file, and that byte stream contains those bytes, in that order:
0x61 0xcc 0x81.
These are bytes that represent in UTF-8 the following chars:
0x61 - lower case "A"
0xcc 0x81 - accute sign "combining diacritical mark"
In any case your best bet is to decode the UTF-8 bytes -- if that's what you have in some way -- into Unicode characters, and then apply those normalizing methods you mentioned.Shouldn't this be transparent? I mean, if I have a stream of valid UTF-8 bytes, shouldn't this kind of "decoding" be seamless when I create a String from it?
Best regards.
Edited by: Danniel_Willian on May 27, 2009 6:16 PM -
Unicode convertion for Czech Language
Hi all,
my system is not unicode and I have to conver it into a unicode one because we are going to plan a roll-out project for our czech branch. We have an ECC5.0 with English, German, Italian and Spanish languages.
I have understood, more or less, what we have to do in a technical way, but I would like to know something more about the growing of hardware space needs.
how much will my system grow reasonably after uniocde conversion and czech language installation? 10%, 20%?
Thanks a lot.Hi,
Have a look at the link below which will help to answer the question.
https://www.sdn.sap.com/irj/sdn/go/portal/prtroot/docs/library/uuid/10317ed9-1c11-2a10-9693-ec0d9a3bc537
https://www.sdn.sap.com/irj/sdn/go/portal/prtroot/docs/library/uuid/589d18d9-0b01-0010-ac8a-8a22852061a2
If the Unicode encoding form is UTF -8 database size growth will be 10% of its original size
If it is UTF 16 then it may grow up to 60 to 70% of the original size
Rgds
Radhakrishna D S -
Converting Unicode to plain text
Hi,
Is there anyway i can get the string "Internationalization" from I�t�rn�ti�n�liz�ti�n?You could decompose it using one of the Unicode normalizations:
http://java.sun.com/javase/6/docs/api/java/text/Normalizer.html
and then go through the result and remove all of the combining characters. -
I have to do String comparision between English String and similar UTF-* string:
Example: Nacori and Nácori, Rayon and Rayón
Right now my String comparision return failure for such cases.
Is there any Java library/Class then can help to mark such cases with similar name as pass.? OR converting Rayón to similar english word like Rayon?Got the solution which I was looking for with the help of the diacritical keywork. Here is the sample code
public static String deAccent(String str)
String nfdNormalizedString = Normalizer.normalize(str, Normalizer.Form.NFD);
Pattern pattern = Pattern.compile("\\p{InCombiningDiacriticalMarks}+");
return pattern.matcher(nfdNormalizedString).replaceAll("");
} -
Script help.. find folders (title oversimplified..)
What I need:
Have list of folder names in Excel
Script uses list to find specified folders. Folders (items in list) are actually three folders deep in directory (external archive disk/dated folder/folder I want).
Script then copies folders to another location.
Basically, I have a main archive directory (Archive1). On that disk are dated folders (102013). Inside the dated folders there are anywhere from 10-20 folders.. those contain the data (study data).. and those are the folders I want. I will have a list of 50-100 data folders provided to me.. all from random dates. I dont want to manually find, then drag and drop data into new archive. I have a list.. and would like to automate this.
What I have:
Script that will find FILES.. not folders... and not in subdirectories. It does everything i need.. if I was only looking for files in one folder.
Script I have is below. If one of you kind souls would point me in the right direction, id appreciate it.
If you want to test the script below:
Make a short list in excel.. maybe just a, b, c in a column.
Create two (named mine archive and transfer) folders on your desktop.. Place three dummy files named a, b, and c in the archive folder. Run the script wile the spreadsheet is open. You will be asked to choose the archive folder. Then the transfer folder. All files from archive folder will then be copied to transfer folder. The spreadsheet will be updated to include filenames with extension in the B colum.
Script:
set studiestofind to (choose folder with prompt "Choose Archive Location")
set transferto to (choose folder with prompt "Choose the Transfer Location")
tell application "Microsoft Excel"
tell active sheet
set lastIndex to first row index of last cell of (get used range)
repeat with i from 2 to lastIndex
set tFileName to my find_filenames(get value of range ("A" & i), studiestofind)
if tFileName is not "" then -- Study Found
set r to my duplicateImage(tFileName, studiestofind, transferto)
if r is not "" then -- no error on duplicate file
set value of range ("B" & i) to tFileName
else
set value of range ("B" & i) to ""
end if
end if
end repeat
end tell
end tell
on duplicateImage(tName, iFolder, mFolder)
try
tell application "Finder" to return (duplicate file tName of iFolder to mFolder) as string
on error
return ""
end try
end duplicateImage
on find_filenames(tVal, thefolder)
--if cell begins with space or ends with space , text items of tVal return empty string, empty string added in list excludedWords
set excludedWords to {"Mini", "and", ""} -- words in Menu Item to exclude in search
set otid to text item delimiters
set text item delimiters to " "
set tWords to text items of tVal
set text item delimiters to otid
set thefolder to quoted form of POSIX path of thefolder
set myListOfWords to {}
repeat with i in tWords -- must be valid
if contents of i is not in excludedWords then set end of myListOfWords to contents of i
end repeat
if (count myListOfWords) = 1 then -- more filler i dont really understand
set i to quoted form of (tVal & ".")
-- search file name
set tpath to do shell script "cd " & thefolder & " && /usr/bin/find . -type f -maxdepth 1 -iname " & i & "*"
if tpath is not "" then return text 3 thru -1 of tpath --return name of exact match
end if
repeat with i in myListOfWords
set tPaths to do shell script "cd " & thefolder & " && /usr/bin/find . -type f -maxdepth 1 -iregex '.*[_/]" & i & "[_.].*' \\! -name '.*' "
if tPaths is not "" then
set L to paragraphs of tPaths
if (count L) = 1 then return text 3 thru -1 of tPaths -- one path found
repeat with tpath in L -- many paths found, loop each path
set isGood to false
repeat with tword in myListOfWords -- check each word of this Menu Item in tpath
if ("/" & tword & "_") is in tpath or ("/" & tword & ".") is in tpath or ("_" & tword & ".") is in tpath or ("_" & tword & "_") is in tpath then
set isGood to true
else
set isGood to false
exit repeat
end if
end repeat
if isGood then return text 3 thru -1 of tpath -- each word of this Menu Item is in name of tpath
end repeat
end if
end repeat
return ""
end find_filenamesHello
It is not hard to modify the existing script you posted so as to retrieve directories at depth 2 but I chose to rewrite it as follows because its filtering logic is very inefficient. In the current form, the script below will retrieve directories at depth 2 in the chosen folder.
The part scripting Excel is not tested, for I don't have Excel. It might fail to set the value of cell in column B when there are multiple matches in which case the script will try to set cell value to multi-line text of every found path. If it fails, we can adjust the code to work aronud it. Your original script only processes the first matched file but I thought it's better to process every matched one.
# Notes
• Filtering logic is modified so that it is now wholly processed by find(1). Also the regexp pattern is modified so that it compares the query word with string delimited by _ and . in file name.
E.g., Given query = "apple orange bananna", the original script matches these -
apple_orange_bananna.tex
apple_bananna_orange.dvi
_bananna__orange__apple.ps
_bananna._apple._orange.pdf
_orange.blueberry_bananna_apple....djvu
but not these -
apple.orange.bananna.txt
apple_.orange_.bananna.txt
apple_orange_blueberry.html
apple_orange_bananna
The new script will match all of them except for "apple_orange_blueberry.html".
• I added code to handle NFD-NFC issue of HFS+ name. HFS+ name is represented as Unicode in NFD (Normalization Form D), while name in Excel could be in NFC (Normalization Form C), in which case find(1) will not match the name whose NFD and NFC are different. This is not a issue if name is only in a-zA-Z0-9. But, e.g., any diacriticals will cause trouble if not with special treatment.
• Script will write log to file on desktop when a) some query gives no result or b) some of the matched files/folders are not copied because item with the same name already exists in the destination.
• Script will behave (mostly) the same as the original script when you set the find parameters properties in _main() as -
-- find parameters
property type : "f" -- d = directory, f = file
property mindepth : 1
property maxdepth : 1
• Currently the value in column B will be the (list of) found and copied path(s) and not the found path(s). If you need all of the found path(s) to be put in column B, switch the following statements in _main():
set value of range ("B" & i) to my _join(return, qq) -- found and copied
--set value of range ("B" & i) to my _join(return, pp) -- found
• Script is tested under OSX 10.5.8. (except for Excel scripting)
# Script
_main()
on _main()
script o
-- directories and logfile
property srcdir : (choose folder with prompt "Choose Archive Location")'s POSIX path
property dstdir : (choose folder with prompt "Choose the Transfer Location")'s POSIX path
property logf : (path to desktop)'s POSIX path & "copy_log@" & (do shell script "date +'%F.txt'")
-- find parameters
property type : "d" -- d = directory, f = file
property mindepth : 2
property maxdepth : 2
-- working lists
property pp : {}
property qq : {}
property rr : {}
-- find & copy nodes
tell application "Microsoft Excel"
tell active sheet
set lastIndex to first row index of last cell of (get used range)
repeat with i from 2 to lastIndex
set query to value of range ("A" & i)
set pp to my find_filenames(query, srcdir, type, mindepth, maxdepth)
if pp = {} then -- not found
tell current application to set ts to do shell script "date +'%F %T%z'"
set entry to "Did not found a name with query word(s): " & query
my log_to_file("%-26s%s\\n", {ts, entry}, logf)
else
set qq to {}
repeat with p in my pp
set p to p's contents
set q to my cp(srcdir & p, dstdir, {replacing:false})
if q ≠ "" then
set end of my qq to p -- found and copied
else
set end of my rr to srcdir & p -- found but not copied
end if
end repeat
set value of range ("B" & i) to my _join(return, qq) -- found and copied
--set value of range ("B" & i) to my _join(return, pp) -- found
end if
end repeat
end tell
end tell
-- log duplicates
if rr ≠ {} then
set ts to do shell script "date +'%F %T%z'"
set entry to "Did not copy the following node(s) due to existing name in destination: " & dstdir
my log_to_file("%-26s%s\\n", {ts, entry}, logf)
repeat with r in my rr
my log_to_file("%-28s%s\\n", {"", r's contents}, logf)
end repeat
end if
end script
tell o to run
end main
on log_to_file(fmt, lst, f)
string fmt : printf format string
list lst : list of strings
string f : POSIX path of log file
local args
set args to ""
repeat with a in {fmt} & lst
set args to args & space & a's quoted form
end repeat
do shell script "printf " & args & " >> " & f's quoted form
end log_to_file
on cp(src, dstdir, {replacing:_replacing})
string src : POSIX path of source file or directory
string dstdir : POSIX path of destination directory
boolean _replacing : true to replace existing destination, false otherwise
return string : POSIX path of copied file or directory, or "" if not copied
set rep to _replacing = true
if src ends with "/" then set src to src's text 1 thru -2
if dstdir ends with "/" then set dstdir to dstdir's text 1 thru -2
set sh to "
rep=$1
src=\"$2\"
dstdir=\"$3\"
dst=\"${dstdir}/${src##*/}\"
[[ $rep == false && -e \"$dst\" ]] && exit
cp -f -pPR \"$src\" \"$dstdir\" && echo \"$dst\" || exit $?
do shell script "/bin/bash -c " & sh's quoted form & " - " & rep & " " & src's quoted form & " " & dstdir's quoted form
end cp
on find_filenames(query, dir, type, mind, maxd)
string query : space delimited words to search
string dir : POSIX path of target root directory
string type : node type to retrieve
f => file
d => directory
integer mind : min depth of nodes in dir to retrieve
integer maxd : max depth of nodes in dir to retrieve
return list : list of POSIX path of found node(s)
script o
property exclude_list : {"Mini", "and", ""} -- list of words to be excluded from query
property pp : _split(space, NFD(query))
property qq : {}
property rr : {}
-- exclude words in exclude_list from query words
repeat with p in my pp
set p to p's contents
if p is not in my exclude_list then set end of my qq to p
end repeat
-- build arguments for find(1)
(* query words are compared with tokens delimited by _ or . in file name *)
repeat with q in my qq
e.g. Given qq = {"apple", "orange", "bananna"}, rr is list of -
-iregex '.*[/_.]apple([_.].*|$)'
-iregex '.*[/_.]orange([_.].*|$)'
-iregex '.*[/_.]bananna([_.].*|$)'
( Note: |'s must be balanced even in comment... )
set q to quotemeta(q) -- quote non-alphanumeric characters to be recognised as literal in regexp
set end of rr to "-iregex '.*[/_.]" & q & "([_.].*|$)'"
end repeat
set |-iregex| to " \\( " & _join(" -and ", my rr) & " \\)"
set |-type| to " -type " & type
set |-mindepth| to " -mindepth " & mind
set |-maxdepth| to " -maxdepth " & maxd
-- build shell script
set sh to "export LC_ALL=en_GB.UTF-8
cd " & dir's quoted form & " || exit
/usr/bin/find -E . " & |-type| & |-mindepth| & |-maxdepth| & |-iregex| & " \\! -name '.*' -print0 |
/usr/bin/perl -CS -ln0e 'print substr($_, 2);'
-- run shell script
return paragraphs of (do shell script sh)
end script
tell o to run
end find_filenames
on NFD(t)
string t : source string
return Unicode text : t in NFD (Normalization Form D)
set pl to "/usr/bin/perl -CSDA -MUnicode::Normalize <<-'EOF' - \"$*\"
print $ARGV[0] ? NFD($ARGV[0]) : qq();
EOF"
do shell script "/bin/bash -c " & pl's quoted form & " - " & t's quoted form
end NFD
on quotemeta(t)
string t : source string
return string : t where all non-alphanumeric characters are quoted by backslash
set pl to "/usr/bin/perl -CSDA -e 'print quotemeta $ARGV[0];' \"$*\""
do shell script "/bin/bash -c " & pl's quoted form & " - " & t's quoted form
end quotemeta
on _join(d, tt)
string d : separator
list tt : source list
return string : tt joined with d
local astid, astid0, t
set astid to a reference to AppleScript's text item delimiters
try
set {astid0, astid's contents} to {astid's contents, {} & d}
set t to "" & tt
set astid's contents to astid0
on error errs number errn
set astid's contents to astid0
error errs number errn
end try
return t
end _join
on _split(d, t)
string or list d : separator(s)
string t : source string
return list : t splitted by d
local astid, astid0, tt
set astid to a reference to AppleScript's text item delimiters
try
set {astid0, astid's contents} to {astid's contents, {} & d}
set tt to t's text items
set astid's contents to astid0
on error errs number errn
set astid's contents to astid0
error errs number errn
end try
return tt
end _split
Hope this may help
H
Message was edited by: Hiroto (fixed some typos)
Maybe you are looking for
-
Little Problem with Policy Setting in EM 11g
Hi all !! I wanna revoke some privileges such as UTL_HTTP, UTL_FILE for Database Security But I don't know if I can revoke above privileges by disabling them in Policy Rule Setting in EM 11g instead of using SQL*plus ??? Regards
-
Hi and Help, I'm in the process of upgrading my VS2008.Standard Windows Forms apps to VS2013.Pro. My concern is I've got the VS2008 apps scattered in folders all over my disc; this includes the VS2008 published apps. All of the converted apps are b
-
I am uanable to download the free trial elements 11 upgrade. An error message reads This application cannot becuase the installer has been misconfigured. Please contact the application author for assistance. HELP
-
Editor suddenly works differently
Cursor no longer gives preview of area to be affected when using dodge or burn tool. Why not? How do I go back to way it was?
-
Hi All, Is it mandatory to do some settings in configuration (SPRO/IMG settings) to backflush spares material during confirmation of maintenance order apart from setting backflush indicator in MRP2 view of material master, in work center & also i