NameSilo

Regular Expression Help [2 lines]

Spaceship Spaceship
Watch
Impact
38
Hi folks,

Plese see the regex below and let me know where I'm going wrong! I have been banging my head on this for a few hours already and I am just stumped.

PHP:
$subcategories = '<li><a href="test">Test - Sports</a></li>
<li><a href="test">Something - Fashion</a></li>
<li><a href="test">Random - Technology</a></li>';
$subcategories = preg_replace('/<a (.*?)/>(.*?)/<\/a/>','<a '.$1.'>'.(isset(explode(' - ', $2)[1])) ? explode(' - ', $2)[1]:$2.'</a>', $subcategories);

I want it to become:
HTML:
<li><a href="test">Sports</a></li>
<li><a href="test">Fashion</a></li>
<li><a href="test">Technology</a></li>

Any help would be appreciated and repped, thanks!
 
Last edited:
0
•••
The views expressed on this page by users and staff are their own, not those of NamePros.
AfternicAfternic
You could parse the whole string into an array with the split function, using the linebreak as a delimiter.
PHP:
<?php
$subcategories = '<li><a href="test">Test - Sports</a></li>
<li><a href="test">Test - Fashion</a></li>
<li><a href="test">Test - Technology</a></li>';
//$subcategories = preg_replace('/<a (.*?)/>(.*?)/<\/a/>','<a '.$1.'>'.(isset(explode(' - ', $2)[1])) ? explode(' - ', $2)[1]:$2.'</a>', $subcategories); 
list($text) = split("[\n\r]+", $subcategories,null);

print_r($text);
?>
 
0
•••
Thanks sdsinc, I will give this a shot. I am not sure this will solve the whole problem. I still need to match on the hyphen (-) and remove the first word from before it. That's the part I'm having the most trouble with.
 
0
•••
Like this I capture everything but the first word after the anchor (assuming the href target is always "test")
PHP:
<?php
$subcategories = '<li><a href="test">Test - Sports</a></li>
<li><a href="test">Something - Fashion</a></li>
<li><a href="test">Random - Technology</a></li>';

$pattern = '/(<li><a href="test">)\w+ - (.*)(<\/a><\/li>)/i';

$replacement = '$1$2$3';
echo preg_replace($pattern, $replacement, $subcategories);
?>
 
0
•••
Like this I capture everything but the first word after the anchor (assuming the href target is always "test")
PHP:
<?php
$subcategories = '<li><a href="test">Test - Sports</a></li>
<li><a href="test">Something - Fashion</a></li>
<li><a href="test">Random - Technology</a></li>';

$pattern = '/(<li><a href="test">)\w+ - (.*)(<\/a><\/li>)/i';

$replacement = '$1$2$3';
echo preg_replace($pattern, $replacement, $subcategories);
?>

Close, but my example wasn't loose enough. The list items and anchor tags actually have title attributes and differing href values, so it would be something more like this:
PHP:
$subcategories = '<li><a href="sports.htm" title="Sports keywords">Test - Sports</a></li>
<li><a href="fashion.php" title="Fashion Stuff">Something - Fashion</a></li>
<li><a href="test">Random - Technology</a></li>';

Rep given, I appreicate you looking at this for me.
 
0
•••
OK let's spice up things lol. What about this:
PHP:
<?php
$subcategories = '<li><a href="sports.htm" title="Sports keywords">Test - Sports</a></li>
<li><a href="fashion.php" title="Fashion Stuff">Something - Fashion</a></li>
<li><a href="test">Random - Technology</a></li>'; 

$pattern = '/<li><a href="([^"]+)"(?: title="[^"]+")?>\w+ - (.*)<\/a><\/li>/im';

$replacement = '<li><a href="$1">$2</a></li>';
echo preg_replace($pattern, $replacement, $subcategories); 
?>
Result:
HTML:
<li><a href="sports.htm">Sports</a></li>
<li><a href="fashion.php">Fashion</a></li>
<li><a href="test">Technology</a></li>

First of all, note that we are just capturing the 2 fields that you need: anchor value and the text within.
Basically, it says:
  • capture the value within <a href=""> (anything but the double quote ")
  • allow for an optional tag title, located one space after the href
  • Note the ?: at the beginning of the title pattern, this is to match the expression between parentheses but not capture it, I assume you don't need it.
  • for the title tag, same simplistic expression, everything else than double quote "

Note the /im modifier at the end of the pattern, i means case-insensitive, m stands for multiline. There are other modifiers like u (Unicode) etc.
 
0
•••
That will work perfectly for what I need, thank you! By the way I like you too much to give more rep to you, apparently. :/
 
0
•••
Dynadot — .com Registration $8.99Dynadot — .com Registration $8.99

We're social

Unstoppable Domains
Domain Recover
DomainEasy — Live Options
  • The sidebar remains visible by scrolling at a speed relative to the page’s height.
Back