You probably just want to add 'org\.uk' to your list (without the quotes). Periods are special in regexps so you need to escape them with a backslash when you want to use one normally...
What I need is to include the extension org.uk in that sequence of top TLDs, which are listed Without the "." in front of them. The extension org.uk has a "." between the org and the uk, what interferes in the results.
That script is a Domain List Cleaner where all characters/words that are not part of a domain then are excluded from the results. The way the list is now make domains like test.org.uk to be excluded as well, when they shoudn't.
For example, the way the script is now, an example list like: 1. test.com
2 . test.net zxzx zxz.xzxzxz
3. test.org.uk
4. test.ca .kjkjk
5. 4444 test.us
1212121212112
would return only: test.com
test.net
test.ca
test.us
thus also excluding the test.org.uk domain as garbage simply because it has a "." in there.
So the question: How to include the org.uk in that sequence of top TLDs?
Your help is appreciated.
Last edited by YesBrilliant : 03-08-2007 at 08:02 PM.
Oh, I think I see the problem... the .org part is matching before the regexp can match it with .org.uk. You could probably play with the greediness of the regexp, but it might just be easier to add 'org.uk' BEFORE the 'org' option...
Please, don't do that. Firstly, other people that come to the thread are going to see your (current) code, then get confused when I start making suggestions to do things that you already apparently did.
Secondly, unless you have that original code somewhere, you're going to have to take my word for it that with that sample set, your original code did the same thing :)
The problem that you experienced there is related not to the regular expression being used to match, but the regular expression you're using to split. The split assumes that you have a space on the line: newlines were not being considered...