[PHP] LInk parsing

liam_d · Nov 18, 2008

Basically i am trying to parse "www.blah.com/efdsf" kinda thing in my forum script.

I have it work when an address has "http://" behind it but not when "www." is on it's own.

Here is my code:

PHP:

$post = preg_replace("`\b(https?|ftp|file)://[-A-Za-z0-9+&@#/%?=~_|!:,.;]*[-A-Za-z0-9+&@#/%=~_|]\b`", '<a href="\0" target="_blank">\0</a>', $post);

$post = preg_replace("/\s(www\.([a-z][a-z0-9_\..-]*[a-z]{2,6})([a-zA-Z0-9\/*-?&%]*))\s/i", " <a href=\"http://$1\">$1</a> ", $post);

Second one should parse with "www." on it's own but it doesn't.

qbert220 · Nov 19, 2008

I can't seen anything wrong with the search part. Your $post must have spaces before and after your expected URL. There are some special characters which aren't escaped, which might be causing your problem. Also, I would escape \ within a double quoted string to avoid confusion ("\z" is the same as "\\z" and '\z', but "\v" is a vertical tab, so is not the same as "\\v" or '\v'.)

The replace part uses $1, which will be interpreted by PHP before being passed to preg_replace. Either escape the $, use a \ instead or use single quotes round the string.

Try:

Code:

$post = preg_replace("/\\s(www\\.([a-z][a-z0-9_\\.-]*[a-z]{2,6})([a-zA-Z0-9\\/\\*-\\?&%]*))\s/i", ' <a href="http://$1">$1</a> ', $post);

liam_d · Nov 19, 2008

I tried the code you posted to no luck, doesn't do anything still

qbert220 · Nov 19, 2008

I tried the following and it worked:

test.php:

Code:

<?php

$post = ' www.blah.com/efdsf ';

$post = preg_replace("/\\s(www\\.([a-z][a-z0-9_\\.-]*[a-z]{2,6})([a-zA-Z0-9\\/\\*-\\?&%]*))\s/i", ' <a href="http://$1">$1</a> ', $post);

echo $post;
echo "\n";

?>

$ php test.php
<a href="http://www.blah.com/efdsf">www.blah.com/efdsf</a>
$

liam_d · Nov 19, 2008

Well it still won't work my end here is the whole code:

PHP:

// main post parser
function main_post_parser($post)
{
	global $db;
	
	$post = htmlentities($post);
	
	$post = trim($post); 
	
	// sort out new lines into breaks
	$post = str_replace(array("\r\n", "\r", "\n"), "<br />", $post);
	if( get_magic_quotes_gpc() ) 
	{
		$post = stripslashes($post);
	}

	if( !is_numeric($post) || $post[0] == '0' ) 
	{
		$post = $db->escape($post);
	}
	
	// check if they are a guest, if they are check if guests links get parsed
	if ($_SESSION['group'] == 4)
	{
		if ($site_config['guest_links_parsed'] == 0)
		{
			// then don't parse
		}
		
		else
		{
			// auto-make links
			// Thanks to "geirha" from ubuntu forums for his amazing work on this for me :)
			$post = preg_replace("`\b(https?|ftp|file)://[-A-Za-z0-9+&@#/%?=~_|!:,.;]*[-A-Za-z0-9+&@#/%=~_|]\b`", '<a href="\0" target="_blank">\0</a>', $post);
			$post = preg_replace("/\\s(www\\.([a-z][a-z0-9_\\.-]*[a-z]{2,6})([a-zA-Z0-9\\/\\*-\\?&%]*))\s/i", ' <a href="http://$1">$1</a> ', $post);

		}
	}
	
	// they are not a guest so parse away
	else if ($_SESSION['group'] != 4)
	{
		// auto-make links
		// Thanks to "geirha" from ubuntu forums for his amazing work on this for me :)
		$post = preg_replace("`\b(https?|ftp|file)://[-A-Za-z0-9+&@#/%?=~_|!:,.;]*[-A-Za-z0-9+&@#/%=~_|]\b`", '<a href="\0" target="_blank">\0</a>', $post);
		$post = preg_replace("/\\s(www\\.([a-z][a-z0-9_\\.-]*[a-z]{2,6})([a-zA-Z0-9\\/\\*-\\?&%]*))\s/i", ' <a href="http://$1">$1</a> ', $post);
	}
	
	return $post;
}

mvl · Nov 19, 2008

Why do you write your own parser? PHP has an excellent built-in function for it :

parse_url()

liam_d · Nov 19, 2008

It is not just an url parser it parses bbcode, html input etc. And i am trying to change it onto a clickable url "parse_url" just breaks it down into pieces which doesn't do much for me?

qbert220 · Nov 19, 2008

mvl said:
Why do you write your own parser? PHP has an excellent built-in function for it :

parse_url()

I think you have misunderstood the problem. parse_url is for pasring a single URL. The OP is trying to replace a URL within a string with an HTML link.

liam_d said:
Well it still won't work my end here is the whole code:

I think my test shows that the preg_replace code should work. I guess that one of the conditions is not being met in your code. Perhaps ($_SESSION['group'] == 4) and ($site_config['guest_links_parsed'] == 0).

liam_d · Nov 19, 2008

If you check the second bit that checks if the session group isnt 4 (which mine is not it is 1) and the other part does not matter for it.

qbert220 · Nov 19, 2008

OK - What is your $post input string? If you call:

Code:

$_SESSION['group'] = 1;
echo main_post_parser(' www.blah.com/efdsf ');

what do you get?

liam_d · Nov 19, 2008

"www.blah.com/efdsf"

I also output the session to check and the group is deffinately at 1.

qbert220 · Nov 19, 2008

I found the problem. You have:

Code:

    $post = trim($post);

which removes the leading and trailing space. Then the preg_replace will not match the string. Move the trim to the end of the function instead and try again.

The code as it stands replaces newlines with "<br />". This will stop it matching a URL after or before a newline. You might want to move this replacement to the end of the function also and change the preg to:

Code:

$post = preg_replace("/(\\s)(www\\.([a-z][a-z0-9_\\.-]*[a-z]{2,6})([a-zA-Z0-9\\/\\*-\\?&%]*))(\\s)/i", '$1<a href="http://$2">$2</a>$5', $post);

the above preserves that matching whitespace before and after the URL.

liam_d · Nov 19, 2008

Wait but that's it though, what if the url is right at the start with no space then it won't parse it.

qbert220 · Nov 19, 2008

Add a space before and after it then:

Code:

function main_post_parser($post)
{
    global $db;
    
    $post = " $post ";
    $post = htmlentities($post);
...

liam_d · Nov 19, 2008

Can we not just stop the preg_replace having to have spaces?

qbert220 · Nov 19, 2008

Try:

Code:

$post = preg_replace("/(www\\.([a-z][a-z0-9_\\.-]*[a-z]{2,6})([a-zA-Z0-9\\/\\*-\\?&%]*))/i", '<a href="http://$1">$1</a>', $post);

j0hnyl · Nov 19, 2008

try this:

http://simplehtmldom.sourceforge.net/manual.htm

it works magic!

liam_d · Nov 19, 2008

qbert220 said:

I tried that with this:

Code:

www.prxa.info/test
http://www.test.com
test.com
http://prxa.info

and got this:

Code:

www.prxa.info/test
/>www.test.com" target="_blank">http://www.test.com
/>test.com
http://prxa.info

qbert220 · Nov 19, 2008

j0hnyl said:
try this:

http://simplehtmldom.sourceforge.net/manual.htm

it works magic!

This may be magic, but will not help solve this problem. We are not trying to parse HTML.

j0hnyl · Nov 19, 2008

it can be a great tool for parsing links as well if you just put it in the <a tag... but i guess its not exactly hat you guys are talking about.

[PHP] LInk parsing

The original NP Emo KidEstablished Member

Established Member

The original NP Emo KidEstablished Member

Established Member

The original NP Emo KidEstablished Member

fka: leonardoEstablished Member

The original NP Emo KidEstablished Member

Established Member

The original NP Emo KidEstablished Member

Established Member

The original NP Emo KidEstablished Member

Established Member

The original NP Emo KidEstablished Member

Established Member

The original NP Emo KidEstablished Member

Established Member

Established Member

The original NP Emo KidEstablished Member

Established Member

Established Member

Onward.co

Maely.com

Solven.AI

Flyte.com

iscratchcards.com

Grindelwald.com

Cherry.co

ClaimProtect.com

Humanoid.co

education.it.com

Similar threads

We're social

Onward.co

Maely.com

Solven.AI

Flyte.com

iscratchcards.com

Grindelwald.com

Cherry.co

ClaimProtect.com

Humanoid.co

education.it.com

Pinned

Appreciation

Agreement

Answers

Relevance

Reaction

Status

Feeling