Colin Paul Adams wrote:
>> codepoints-to-string(string-to-codepoints(normalize-unicode($in,
>> 'NFKD'))[. lt 127])
No. Not unless you correct the error in it first.
What error?
The following:
codepoints-to-string(string-to-codepoints(normalize-unicode('@ABCDEFGHIJKLMNOPQRSTUV',
'NFKD'))[. le 127])
returns "AAAAAACEEEEIIIINOOOOO"
it misses the P and F, but I am not well-educated enough to understand
normalize-unicode NFKD algorithms and whether that is an error or not.
In addition, I changed lt to le, but I was under the impression that
codepoint 127 was not part of Latin-1. The code itself was correct, but
the definition of "plain latin" from the OP perhaps needs some
clarification.
But here's one that removes all punctuation, but leaves alone the other
symbols, like 0, . and ', but also the missing F and P
codepoints-to-string(
string-to-codepoints(
normalize-unicode('@ABCDEFGHIJKLMNOPQRSTUV0.'', 'NFKD'))
[replace(codepoints-to-string(.), '[\p{M}]', '')])
This returns "AAAAAAFCEEEEIIIIPNOOOOO0.'"
but is not even close to as pretty as Michael's! Note the double
codepoints-to-string, (which make it u-u-u-u-gly!). The alternative, a
replace on the cpts of the whole stcp+normalize, would automatically
normalize the results back before the regular expression can do its
work. But like I said, it is u-u-u-ugly!
-- Abel
PS: hope the mailer does not mess too much with the high Latin-1
characters....