[PHP] LInk parsing

liam_d · Nov 18, 2008

Basically i am trying to parse "www.blah.com/efdsf" kinda thing in my forum script.

I have it work when an address has "http://" behind it but not when "www." is on it's own.

Here is my code:

PHP:

$post = preg_replace("`\b(https?|ftp|file)://[-A-Za-z0-9+&@#/%?=~_|!:,.;]*[-A-Za-z0-9+&@#/%=~_|]\b`", '<a href="\0" target="_blank">\0</a>', $post);

$post = preg_replace("/\s(www\.([a-z][a-z0-9_\..-]*[a-z]{2,6})([a-zA-Z0-9\/*-?&%]*))\s/i", " <a href=\"http://$1\">$1</a> ", $post);

Second one should parse with "www." on it's own but it doesn't.

RageD · Dec 9, 2008

Ah edit. Just caught an error I'll work on it and see what I can do

-RageD

RageD · Dec 10, 2008

ok, now I'm no regex expert by any stretch of the means, but this works without double parsing, etc.:

PHP:

        $post = preg_replace("`\b(https?|ftp|file)://[^www][-A-Za-z0-9+&@#/%?=~_|!:,.;]*[-A-Za-z0-9+&@#/%=~_|]\b`", '<a href="\0" target="_blank">\0</a>', $post);
	if(preg_match("/(www\\.([a-z][a-z0-9_\\.-]*[a-z]{2,6})([a-zA-Z0-9\\/\\*-\\?&%]*))/i", $post))
	{
		// Yes, order matters :P First one changes http://www.*
		// Second one just www.* to avoid the double parse
		$post = preg_replace("`\b(https?|ftp|file)://[www][-A-Za-z0-9+&@#/%?=~_|!:,.;]*[-A-Za-z0-9+&@#/%=~_|]\b`", '<a href="\0">\0</a>', $post);
		$post = preg_replace("(\b[^https?\\:\\/\\/|^ftp\\:\\/\\/|^file\\:\\/\\/](www\\.([a-z][a-z0-9_\\.-]*[a-z]{2,6})([a-zA-Z0-9\\/\\*-\\?&%]*)\b))", '<a href="http://$1">\0</a>', $post);
	}

-RageD

qbert220 · Dec 10, 2008

Try:

Code:

$post = preg_replace("/(http:\\/\\/)?((www\\.)?[a-z][a-z0-9_\\.-]*\\.[a-z]{2,6}[a-zA-Z0-9\\/-\\.\\?&%]*)/i", '<a href="http://$2">$2</a>', $post);

this should avoid matching normal words (the expression must have a dot in it.)

liam_d · Dec 13, 2008

RageD said:

ok, now I'm no regex expert by any stretch of the means, but this works without double parsing, etc.:

PHP:

        $post = preg_replace("`\b(https?|ftp|file)://[^www][-A-Za-z0-9+&@#/%?=~_|!:,.;]*[-A-Za-z0-9+&@#/%=~_|]\b`", '<a href="\0" target="_blank">\0</a>', $post);
	if(preg_match("/(www\\.([a-z][a-z0-9_\\.-]*[a-z]{2,6})([a-zA-Z0-9\\/\\*-\\?&%]*))/i", $post))
	{
		// Yes, order matters :P First one changes http://www.*
		// Second one just www.* to avoid the double parse
		$post = preg_replace("`\b(https?|ftp|file)://[www][-A-Za-z0-9+&@#/%?=~_|!:,.;]*[-A-Za-z0-9+&@#/%=~_|]\b`", '<a href="\0">\0</a>', $post);
		$post = preg_replace("(\b[^https?\\:\\/\\/|^ftp\\:\\/\\/|^file\\:\\/\\/](www\\.([a-z][a-z0-9_\\.-]*[a-z]{2,6})([a-zA-Z0-9\\/\\*-\\?&%]*)\b))", '<a href="http://$1">\0</a>', $post);
	}

-RageD

That works but it won't turn urls with just a "www." into a link.

qbert220 said:
Try:

Code:

$post = preg_replace("/(http:\\/\\/)?((www\\.)?[a-z][a-z0-9_\\.-]*\\.[a-z]{2,6}[a-zA-Z0-9\\/-\\.\\?&%]*)/i", '<a href="http://$2">$2</a>', $post);

this should avoid matching normal words (the expression must have a dot in it.)

That errors me out with this:

Warning: preg_replace() [function.preg-replace]: Compilation failed: range out of order in character class at offset 65 in /home/prxainfo/public_html/forum/includes/post_parser.php on line 45

Warning: preg_replace() [function.preg-replace]: Compilation failed: range out of order in character class at offset 65 in /home/prxainfo/public_html/forum/includes/post_parser.php on line 45

Warning: preg_replace() [function.preg-replace]: Compilation failed: range out of order in character class at offset 65 in /home/prxainfo/public_html/forum/includes/post_parser.php on line 45

Thanks for all the help so far guys, can't wait to nail it on the head

RageD · Dec 13, 2008

liam_d said:
That works but it won't turn urls with just a "www." into a link.

Doh! sorry, had an extra \b in there that shouldn't have been. Try this:

PHP:

$post = preg_replace("`\b(https?|ftp|file)://[^www][-A-Za-z0-9+&@#/%?=~_|!:,.;]*[-A-Za-z0-9+&@#/%=~_|]\b`", '<a href="\0" target="_blank">\0</a>', $post);
    if(preg_match("/(www\\.([a-z][a-z0-9_\\.-]*[a-z]{2,6})([a-zA-Z0-9\\/\\*-\\?&%]*))/i", $post))
    {
        // Yes, order matters :P First one changes http://www.*
        // Second one just www.* to avoid the double parse
        $post = preg_replace("`\b(https?|ftp|file)://[www][-A-Za-z0-9+&@#/%?=~_|!:,.;]*[-A-Za-z0-9+&@#/%=~_|]\b`", '<a href="\0">\0</a>', $post);
        $post = preg_replace("([^https?\\:\\/\\/|^ftp\\:\\/\\/|^file\\:\\/\\/](www\\.([a-z][a-z0-9_\\.-]*[a-z]{2,6})([a-zA-Z0-9\\/\\*-\\?&%]*)\b))", '<a href="http://$1">\0</a>', $post);
    }

Here is my exact test document:

PHP:

<?php
// main post parser
function main_post_parser($post)
{
    global $db;
    
    $post = htmlentities($post);
    
    $post = trim($post);
    
    // sort out new lines into breaks
    $post = str_replace(array("\r\n", "\r", "\n"), "<br />", $post);
    if( get_magic_quotes_gpc() )
    {
        $post = stripslashes($post);
    }

    if( !is_numeric($post) || $post[0] == '0' )
    {
        // $post = $db->escape($post); <-- Your version
	$post = mysql_escape_string($post); // I don't have the function :) lolz
    }
    
    // check if they are a guest, if they are check if guests links get parsed
    if ($_SESSION['group'] == 4)
    {
        if ($site_config['guest_links_parsed'] == 0)
        {
            // then don't parse
        }
        
        else
        {
            // auto-make links
            // Thanks to "geirha" from ubuntu forums for his amazing work on this for me :)
	   $post = preg_replace("`\b(https?|ftp|file)://[^www][-A-Za-z0-9+&@#/%?=~_|!:,.;]*[-A-Za-z0-9+&@#/%=~_|]\b`", '<a href="\0" target="_blank">\0</a>', $post);
    	   if(preg_match("/(www\\.([a-z][a-z0-9_\\.-]*[a-z]{2,6})([a-zA-Z0-9\\/\\*-\\?&%]*))/i", $post))
    	   {
      		  // Yes, order matters :P First one changes http://www.*
        	  // Second one just www.* to avoid the double parse
       		 $post = preg_replace("`\b(https?|ftp|file)://[www][-A-Za-z0-9+&@#/%?=~_|!:,.;]*[-A-Za-z0-9+&@#/%=~_|]\b`", '<a href="\0">\0</a>', $post);
       		 $post = preg_replace("([^https?\\:\\/\\/|^ftp\\:\\/\\/|^file\\:\\/\\/](www\\.([a-z][a-z0-9_\\.-]*[a-z]{2,6})([a-zA-Z0-9\\/\\*-\\?&%]*)\b))", '<a href="http://$1">\0</a>', $post);
    	   } 
        }
    }
    
    // they are not a guest so parse away
    else if ($_SESSION['group'] != 4)
    {
        // auto-make links
        // Thanks to "geirha" from ubuntu forums for his amazing work on this for me :)
	   $post = preg_replace("`\b(https?|ftp|file)://[^www][-A-Za-z0-9+&@#/%?=~_|!:,.;]*[-A-Za-z0-9+&@#/%=~_|]\b`", '<a href="\0" target="_blank">\0</a>', $post);
    	   if(preg_match("/(www\\.([a-z][a-z0-9_\\.-]*[a-z]{2,6})([a-zA-Z0-9\\/\\*-\\?&%]*))/i", $post))
    	   {
      		  // Yes, order matters :P First one changes http://www.*
        	  // Second one just www.* to avoid the double parse
       		 $post = preg_replace("`\b(https?|ftp|file)://[www][-A-Za-z0-9+&@#/%?=~_|!:,.;]*[-A-Za-z0-9+&@#/%=~_|]\b`", '<a href="\0">\0</a>', $post);
       		 $post = preg_replace("([^https?\\:\\/\\/|^ftp\\:\\/\\/|^file\\:\\/\\/](www\\.([a-z][a-z0-9_\\.-]*[a-z]{2,6})([a-zA-Z0-9\\/\\*-\\?&%]*)\b))", '<a href="http://$1">\0</a>', $post);
    	   }
    }
    return $post;
} 
 
echo main_post_parser("Check: http://test.com, check: www.test.com, check: http://www.test.com
Check line break.. w00t :)");

?>

qbert220 · Dec 14, 2008

liam_d said:
That errors me out with this:

A dash in the wrong place. Try:

Code:

$post = preg_replace("/(http:\\/\\/)?((www\\.)?[a-z][a-z0-9_\\.-]*\\.[a-z]{2,6}[a-zA-Z0-9\\/\\.\\?&%-]*)/i", '<a href="http://$2">$2</a>', $post);

liam_d · Dec 17, 2008

RageD said:

Doh! sorry, had an extra \b in there that shouldn't have been. Try this:

PHP:

$post = preg_replace("`\b(https?|ftp|file)://[^www][-A-Za-z0-9+&@#/%?=~_|!:,.;]*[-A-Za-z0-9+&@#/%=~_|]\b`", '<a href="\0" target="_blank">\0</a>', $post);
    if(preg_match("/(www\\.([a-z][a-z0-9_\\.-]*[a-z]{2,6})([a-zA-Z0-9\\/\\*-\\?&%]*))/i", $post))
    {
        // Yes, order matters :P First one changes http://www.*
        // Second one just www.* to avoid the double parse
        $post = preg_replace("`\b(https?|ftp|file)://[www][-A-Za-z0-9+&@#/%?=~_|!:,.;]*[-A-Za-z0-9+&@#/%=~_|]\b`", '<a href="\0">\0</a>', $post);
        $post = preg_replace("([^https?\\:\\/\\/|^ftp\\:\\/\\/|^file\\:\\/\\/](www\\.([a-z][a-z0-9_\\.-]*[a-z]{2,6})([a-zA-Z0-9\\/\\*-\\?&%]*)\b))", '<a href="http://$1">\0</a>', $post);
    }

Here is my exact test document:

PHP:

<?php
// main post parser
function main_post_parser($post)
{
    global $db;
    
    $post = htmlentities($post);
    
    $post = trim($post);
    
    // sort out new lines into breaks
    $post = str_replace(array("\r\n", "\r", "\n"), "<br />", $post);
    if( get_magic_quotes_gpc() )
    {
        $post = stripslashes($post);
    }

    if( !is_numeric($post) || $post[0] == '0' )
    {
        // $post = $db->escape($post); <-- Your version
	$post = mysql_escape_string($post); // I don't have the function :) lolz
    }
    
    // check if they are a guest, if they are check if guests links get parsed
    if ($_SESSION['group'] == 4)
    {
        if ($site_config['guest_links_parsed'] == 0)
        {
            // then don't parse
        }
        
        else
        {
            // auto-make links
            // Thanks to "geirha" from ubuntu forums for his amazing work on this for me :)
	   $post = preg_replace("`\b(https?|ftp|file)://[^www][-A-Za-z0-9+&@#/%?=~_|!:,.;]*[-A-Za-z0-9+&@#/%=~_|]\b`", '<a href="\0" target="_blank">\0</a>', $post);
    	   if(preg_match("/(www\\.([a-z][a-z0-9_\\.-]*[a-z]{2,6})([a-zA-Z0-9\\/\\*-\\?&%]*))/i", $post))
    	   {
      		  // Yes, order matters :P First one changes http://www.*
        	  // Second one just www.* to avoid the double parse
       		 $post = preg_replace("`\b(https?|ftp|file)://[www][-A-Za-z0-9+&@#/%?=~_|!:,.;]*[-A-Za-z0-9+&@#/%=~_|]\b`", '<a href="\0">\0</a>', $post);
       		 $post = preg_replace("([^https?\\:\\/\\/|^ftp\\:\\/\\/|^file\\:\\/\\/](www\\.([a-z][a-z0-9_\\.-]*[a-z]{2,6})([a-zA-Z0-9\\/\\*-\\?&%]*)\b))", '<a href="http://$1">\0</a>', $post);
    	   } 
        }
    }
    
    // they are not a guest so parse away
    else if ($_SESSION['group'] != 4)
    {
        // auto-make links
        // Thanks to "geirha" from ubuntu forums for his amazing work on this for me :)
	   $post = preg_replace("`\b(https?|ftp|file)://[^www][-A-Za-z0-9+&@#/%?=~_|!:,.;]*[-A-Za-z0-9+&@#/%=~_|]\b`", '<a href="\0" target="_blank">\0</a>', $post);
    	   if(preg_match("/(www\\.([a-z][a-z0-9_\\.-]*[a-z]{2,6})([a-zA-Z0-9\\/\\*-\\?&%]*))/i", $post))
    	   {
      		  // Yes, order matters :P First one changes http://www.*
        	  // Second one just www.* to avoid the double parse
       		 $post = preg_replace("`\b(https?|ftp|file)://[www][-A-Za-z0-9+&@#/%?=~_|!:,.;]*[-A-Za-z0-9+&@#/%=~_|]\b`", '<a href="\0">\0</a>', $post);
       		 $post = preg_replace("([^https?\\:\\/\\/|^ftp\\:\\/\\/|^file\\:\\/\\/](www\\.([a-z][a-z0-9_\\.-]*[a-z]{2,6})([a-zA-Z0-9\\/\\*-\\?&%]*)\b))", '<a href="http://$1">\0</a>', $post);
    	   }
    }
    return $post;
} 
 
echo main_post_parser("Check: http://test.com, check: www.test.com, check: http://www.test.com
Check line break.. w00t :)");

?>

That gave me this:

www.prxa.info
>www.prxa.com
/>http://www.prxa.info

qbert220 said:
A dash in the wrong place. Try:

Code:

$post = preg_replace("/(http:\\/\\/)?((www\\.)?[a-z][a-z0-9_\\.-]*\\.[a-z]{2,6}[a-zA-Z0-9\\/\\.\\?&%-]*)/i", '<a href="http://$2">$2</a>', $post);

That worked perfectly, many thanks my man!

FrozenNova · Jan 5, 2009

Try using

Code:

strip_tags($post);

straight after defining the $post variable .
This should remove the unwanted page breaks etc

liam_d · Jan 27, 2009

qbert220 said:
A dash in the wrong place. Try:

Code:

$post = preg_replace("/(http:\\/\\/)?((www\\.)?[a-z][a-z0-9_\\.-]*\\.[a-z]{2,6}[a-zA-Z0-9\\/\\.\\?&%-]*)/i", '<a href="http://$2">$2</a>', $post);

I found a bug in this for some reason if i post this "test...test" it would turn that into a link that obviously goes no where?

http://forum.prxa.info/viewtopic.php?tid=2&fid=8

Take a look at the second post there.

liam_d · Feb 3, 2009

[PHP] LInk parsing

More options

liam_d

The original NP Emo KidEstablished Member

RageD

VIP Member

RageD

VIP Member

qbert220

Established Member

liam_d

The original NP Emo KidEstablished Member

RageD

VIP Member

qbert220

Established Member

liam_d

The original NP Emo KidEstablished Member

FrozenNova

Established Member

liam_d

The original NP Emo KidEstablished Member

liam_d

The original NP Emo KidEstablished Member

Similar threads

We're social

New posts

Auction activity

What non .com extensions have you sold in 2024?

Polls of interest

Popular this week

Community favorite

Popular this month

Blog favorites

Pinned

Appreciation

Agreement