NameSilo

[PHP] LInk parsing

Spaceship Spaceship
Watch

liam_d

The original NP Emo KidEstablished Member
Impact
25
Basically i am trying to parse "www.blah.com/efdsf" kinda thing in my forum script.

I have it work when an address has "http://" behind it but not when "www." is on it's own.

Here is my code:
PHP:
$post = preg_replace("`\b(https?|ftp|file)://[-A-Za-z0-9+&@#/%?=~_|!:,.;]*[-A-Za-z0-9+&@#/%=~_|]\b`", '<a href="\0" target="_blank">\0</a>', $post);

$post = preg_replace("/\s(www\.([a-z][a-z0-9_\..-]*[a-z]{2,6})([a-zA-Z0-9\/*-?&%]*))\s/i", " <a href=\"http://$1\">$1</a> ", $post);

Second one should parse with "www." on it's own but it doesn't.
 
0
•••
The views expressed on this page by users and staff are their own, not those of NamePros.
Ah edit. Just caught an error I'll work on it and see what I can do :)

-RageD
 
Last edited:
0
•••
ok, now I'm no regex expert by any stretch of the means, but this works without double parsing, etc.:

PHP:
        $post = preg_replace("`\b(https?|ftp|file)://[^www][-A-Za-z0-9+&@#/%?=~_|!:,.;]*[-A-Za-z0-9+&@#/%=~_|]\b`", '<a href="\0" target="_blank">\0</a>', $post);
	if(preg_match("/(www\\.([a-z][a-z0-9_\\.-]*[a-z]{2,6})([a-zA-Z0-9\\/\\*-\\?&%]*))/i", $post))
	{
		// Yes, order matters :P First one changes http://www.*
		// Second one just www.* to avoid the double parse
		$post = preg_replace("`\b(https?|ftp|file)://[www][-A-Za-z0-9+&@#/%?=~_|!:,.;]*[-A-Za-z0-9+&@#/%=~_|]\b`", '<a href="\0">\0</a>', $post);
		$post = preg_replace("(\b[^https?\\:\\/\\/|^ftp\\:\\/\\/|^file\\:\\/\\/](www\\.([a-z][a-z0-9_\\.-]*[a-z]{2,6})([a-zA-Z0-9\\/\\*-\\?&%]*)\b))", '<a href="http://$1">\0</a>', $post);
	}

-RageD
 
0
•••
Try:

Code:
$post = preg_replace("/(http:\\/\\/)?((www\\.)?[a-z][a-z0-9_\\.-]*\\.[a-z]{2,6}[a-zA-Z0-9\\/-\\.\\?&%]*)/i", '<a href="http://$2">$2</a>', $post);

this should avoid matching normal words (the expression must have a dot in it.)
 
Last edited:
0
•••
RageD said:
ok, now I'm no regex expert by any stretch of the means, but this works without double parsing, etc.:

PHP:
        $post = preg_replace("`\b(https?|ftp|file)://[^www][-A-Za-z0-9+&@#/%?=~_|!:,.;]*[-A-Za-z0-9+&@#/%=~_|]\b`", '<a href="\0" target="_blank">\0</a>', $post);
	if(preg_match("/(www\\.([a-z][a-z0-9_\\.-]*[a-z]{2,6})([a-zA-Z0-9\\/\\*-\\?&%]*))/i", $post))
	{
		// Yes, order matters :P First one changes http://www.*
		// Second one just www.* to avoid the double parse
		$post = preg_replace("`\b(https?|ftp|file)://[www][-A-Za-z0-9+&@#/%?=~_|!:,.;]*[-A-Za-z0-9+&@#/%=~_|]\b`", '<a href="\0">\0</a>', $post);
		$post = preg_replace("(\b[^https?\\:\\/\\/|^ftp\\:\\/\\/|^file\\:\\/\\/](www\\.([a-z][a-z0-9_\\.-]*[a-z]{2,6})([a-zA-Z0-9\\/\\*-\\?&%]*)\b))", '<a href="http://$1">\0</a>', $post);
	}

-RageD

That works but it won't turn urls with just a "www." into a link.

qbert220 said:
Try:

Code:
$post = preg_replace("/(http:\\/\\/)?((www\\.)?[a-z][a-z0-9_\\.-]*\\.[a-z]{2,6}[a-zA-Z0-9\\/-\\.\\?&%]*)/i", '<a href="http://$2">$2</a>', $post);

this should avoid matching normal words (the expression must have a dot in it.)

That errors me out with this:
Warning: preg_replace() [function.preg-replace]: Compilation failed: range out of order in character class at offset 65 in /home/prxainfo/public_html/forum/includes/post_parser.php on line 45

Warning: preg_replace() [function.preg-replace]: Compilation failed: range out of order in character class at offset 65 in /home/prxainfo/public_html/forum/includes/post_parser.php on line 45

Warning: preg_replace() [function.preg-replace]: Compilation failed: range out of order in character class at offset 65 in /home/prxainfo/public_html/forum/includes/post_parser.php on line 45

Thanks for all the help so far guys, can't wait to nail it on the head :)
 
Last edited:
0
•••
liam_d said:
That works but it won't turn urls with just a "www." into a link.

Doh! sorry, had an extra \b in there that shouldn't have been. Try this:

PHP:
$post = preg_replace("`\b(https?|ftp|file)://[^www][-A-Za-z0-9+&@#/%?=~_|!:,.;]*[-A-Za-z0-9+&@#/%=~_|]\b`", '<a href="\0" target="_blank">\0</a>', $post);
    if(preg_match("/(www\\.([a-z][a-z0-9_\\.-]*[a-z]{2,6})([a-zA-Z0-9\\/\\*-\\?&%]*))/i", $post))
    {
        // Yes, order matters :P First one changes http://www.*
        // Second one just www.* to avoid the double parse
        $post = preg_replace("`\b(https?|ftp|file)://[www][-A-Za-z0-9+&@#/%?=~_|!:,.;]*[-A-Za-z0-9+&@#/%=~_|]\b`", '<a href="\0">\0</a>', $post);
        $post = preg_replace("([^https?\\:\\/\\/|^ftp\\:\\/\\/|^file\\:\\/\\/](www\\.([a-z][a-z0-9_\\.-]*[a-z]{2,6})([a-zA-Z0-9\\/\\*-\\?&%]*)\b))", '<a href="http://$1">\0</a>', $post);
    }

Here is my exact test document:
PHP:
<?php
// main post parser
function main_post_parser($post)
{
    global $db;
    
    $post = htmlentities($post);
    
    $post = trim($post);
    
    // sort out new lines into breaks
    $post = str_replace(array("\r\n", "\r", "\n"), "<br />", $post);
    if( get_magic_quotes_gpc() )
    {
        $post = stripslashes($post);
    }

    if( !is_numeric($post) || $post[0] == '0' )
    {
        // $post = $db->escape($post); <-- Your version
	$post = mysql_escape_string($post); // I don't have the function :) lolz
    }
    
    // check if they are a guest, if they are check if guests links get parsed
    if ($_SESSION['group'] == 4)
    {
        if ($site_config['guest_links_parsed'] == 0)
        {
            // then don't parse
        }
        
        else
        {
            // auto-make links
            // Thanks to "geirha" from ubuntu forums for his amazing work on this for me :)
	   $post = preg_replace("`\b(https?|ftp|file)://[^www][-A-Za-z0-9+&@#/%?=~_|!:,.;]*[-A-Za-z0-9+&@#/%=~_|]\b`", '<a href="\0" target="_blank">\0</a>', $post);
    	   if(preg_match("/(www\\.([a-z][a-z0-9_\\.-]*[a-z]{2,6})([a-zA-Z0-9\\/\\*-\\?&%]*))/i", $post))
    	   {
      		  // Yes, order matters :P First one changes http://www.*
        	  // Second one just www.* to avoid the double parse
       		 $post = preg_replace("`\b(https?|ftp|file)://[www][-A-Za-z0-9+&@#/%?=~_|!:,.;]*[-A-Za-z0-9+&@#/%=~_|]\b`", '<a href="\0">\0</a>', $post);
       		 $post = preg_replace("([^https?\\:\\/\\/|^ftp\\:\\/\\/|^file\\:\\/\\/](www\\.([a-z][a-z0-9_\\.-]*[a-z]{2,6})([a-zA-Z0-9\\/\\*-\\?&%]*)\b))", '<a href="http://$1">\0</a>', $post);
    	   } 
        }
    }
    
    // they are not a guest so parse away
    else if ($_SESSION['group'] != 4)
    {
        // auto-make links
        // Thanks to "geirha" from ubuntu forums for his amazing work on this for me :)
	   $post = preg_replace("`\b(https?|ftp|file)://[^www][-A-Za-z0-9+&@#/%?=~_|!:,.;]*[-A-Za-z0-9+&@#/%=~_|]\b`", '<a href="\0" target="_blank">\0</a>', $post);
    	   if(preg_match("/(www\\.([a-z][a-z0-9_\\.-]*[a-z]{2,6})([a-zA-Z0-9\\/\\*-\\?&%]*))/i", $post))
    	   {
      		  // Yes, order matters :P First one changes http://www.*
        	  // Second one just www.* to avoid the double parse
       		 $post = preg_replace("`\b(https?|ftp|file)://[www][-A-Za-z0-9+&@#/%?=~_|!:,.;]*[-A-Za-z0-9+&@#/%=~_|]\b`", '<a href="\0">\0</a>', $post);
       		 $post = preg_replace("([^https?\\:\\/\\/|^ftp\\:\\/\\/|^file\\:\\/\\/](www\\.([a-z][a-z0-9_\\.-]*[a-z]{2,6})([a-zA-Z0-9\\/\\*-\\?&%]*)\b))", '<a href="http://$1">\0</a>', $post);
    	   }
    }
    return $post;
} 
 
echo main_post_parser("Check: http://test.com, check: www.test.com, check: http://www.test.com
Check line break.. w00t :)");

?>
 
Last edited:
0
•••
liam_d said:
That errors me out with this:

A dash in the wrong place. Try:

Code:
$post = preg_replace("/(http:\\/\\/)?((www\\.)?[a-z][a-z0-9_\\.-]*\\.[a-z]{2,6}[a-zA-Z0-9\\/\\.\\?&%-]*)/i", '<a href="http://$2">$2</a>', $post);
 
0
•••
RageD said:
Doh! sorry, had an extra \b in there that shouldn't have been. Try this:

PHP:
$post = preg_replace("`\b(https?|ftp|file)://[^www][-A-Za-z0-9+&@#/%?=~_|!:,.;]*[-A-Za-z0-9+&@#/%=~_|]\b`", '<a href="\0" target="_blank">\0</a>', $post);
    if(preg_match("/(www\\.([a-z][a-z0-9_\\.-]*[a-z]{2,6})([a-zA-Z0-9\\/\\*-\\?&%]*))/i", $post))
    {
        // Yes, order matters :P First one changes http://www.*
        // Second one just www.* to avoid the double parse
        $post = preg_replace("`\b(https?|ftp|file)://[www][-A-Za-z0-9+&@#/%?=~_|!:,.;]*[-A-Za-z0-9+&@#/%=~_|]\b`", '<a href="\0">\0</a>', $post);
        $post = preg_replace("([^https?\\:\\/\\/|^ftp\\:\\/\\/|^file\\:\\/\\/](www\\.([a-z][a-z0-9_\\.-]*[a-z]{2,6})([a-zA-Z0-9\\/\\*-\\?&%]*)\b))", '<a href="http://$1">\0</a>', $post);
    }

Here is my exact test document:
PHP:
<?php
// main post parser
function main_post_parser($post)
{
    global $db;
    
    $post = htmlentities($post);
    
    $post = trim($post);
    
    // sort out new lines into breaks
    $post = str_replace(array("\r\n", "\r", "\n"), "<br />", $post);
    if( get_magic_quotes_gpc() )
    {
        $post = stripslashes($post);
    }

    if( !is_numeric($post) || $post[0] == '0' )
    {
        // $post = $db->escape($post); <-- Your version
	$post = mysql_escape_string($post); // I don't have the function :) lolz
    }
    
    // check if they are a guest, if they are check if guests links get parsed
    if ($_SESSION['group'] == 4)
    {
        if ($site_config['guest_links_parsed'] == 0)
        {
            // then don't parse
        }
        
        else
        {
            // auto-make links
            // Thanks to "geirha" from ubuntu forums for his amazing work on this for me :)
	   $post = preg_replace("`\b(https?|ftp|file)://[^www][-A-Za-z0-9+&@#/%?=~_|!:,.;]*[-A-Za-z0-9+&@#/%=~_|]\b`", '<a href="\0" target="_blank">\0</a>', $post);
    	   if(preg_match("/(www\\.([a-z][a-z0-9_\\.-]*[a-z]{2,6})([a-zA-Z0-9\\/\\*-\\?&%]*))/i", $post))
    	   {
      		  // Yes, order matters :P First one changes http://www.*
        	  // Second one just www.* to avoid the double parse
       		 $post = preg_replace("`\b(https?|ftp|file)://[www][-A-Za-z0-9+&@#/%?=~_|!:,.;]*[-A-Za-z0-9+&@#/%=~_|]\b`", '<a href="\0">\0</a>', $post);
       		 $post = preg_replace("([^https?\\:\\/\\/|^ftp\\:\\/\\/|^file\\:\\/\\/](www\\.([a-z][a-z0-9_\\.-]*[a-z]{2,6})([a-zA-Z0-9\\/\\*-\\?&%]*)\b))", '<a href="http://$1">\0</a>', $post);
    	   } 
        }
    }
    
    // they are not a guest so parse away
    else if ($_SESSION['group'] != 4)
    {
        // auto-make links
        // Thanks to "geirha" from ubuntu forums for his amazing work on this for me :)
	   $post = preg_replace("`\b(https?|ftp|file)://[^www][-A-Za-z0-9+&@#/%?=~_|!:,.;]*[-A-Za-z0-9+&@#/%=~_|]\b`", '<a href="\0" target="_blank">\0</a>', $post);
    	   if(preg_match("/(www\\.([a-z][a-z0-9_\\.-]*[a-z]{2,6})([a-zA-Z0-9\\/\\*-\\?&%]*))/i", $post))
    	   {
      		  // Yes, order matters :P First one changes http://www.*
        	  // Second one just www.* to avoid the double parse
       		 $post = preg_replace("`\b(https?|ftp|file)://[www][-A-Za-z0-9+&@#/%?=~_|!:,.;]*[-A-Za-z0-9+&@#/%=~_|]\b`", '<a href="\0">\0</a>', $post);
       		 $post = preg_replace("([^https?\\:\\/\\/|^ftp\\:\\/\\/|^file\\:\\/\\/](www\\.([a-z][a-z0-9_\\.-]*[a-z]{2,6})([a-zA-Z0-9\\/\\*-\\?&%]*)\b))", '<a href="http://$1">\0</a>', $post);
    	   }
    }
    return $post;
} 
 
echo main_post_parser("Check: http://test.com, check: www.test.com, check: http://www.test.com
Check line break.. w00t :)");

?>

That gave me this:

qbert220 said:
A dash in the wrong place. Try:

Code:
$post = preg_replace("/(http:\\/\\/)?((www\\.)?[a-z][a-z0-9_\\.-]*\\.[a-z]{2,6}[a-zA-Z0-9\\/\\.\\?&%-]*)/i", '<a href="http://$2">$2</a>', $post);

That worked perfectly, many thanks my man!
 
0
•••
Try using
Code:
strip_tags($post);
straight after defining the $post variable .
This should remove the unwanted page breaks etc
 
0
•••
qbert220 said:
A dash in the wrong place. Try:

Code:
$post = preg_replace("/(http:\\/\\/)?((www\\.)?[a-z][a-z0-9_\\.-]*\\.[a-z]{2,6}[a-zA-Z0-9\\/\\.\\?&%-]*)/i", '<a href="http://$2">$2</a>', $post);

I found a bug in this for some reason if i post this "test...test" it would turn that into a link that obviously goes no where?

http://forum.prxa.info/viewtopic.php?tid=2&fid=8

Take a look at the second post there.
 
0
•••
0
•••
  • The sidebar remains visible by scrolling at a speed relative to the page’s height.
Back