NamePros
Welcome, Guest! Ready to make a name for yourself in the domain business? We welcome both the hobbyist and professional domainer to join the discussion as part of the NamePros community.

Click here to create your profile to start earning reputation for posting, and trader ratings for buying & selling in our free e-marketplace. Build your trader rating with each successful sale. Our system has tracked over 100,000 sales and counting!
FAQ & TOS Register Search Today's Posts Mark Forums Read

Go Back   NamePros.com > Website Development Discussion Forums > Programming
Reload this Page Extract keywords from domain name

Programming PHP, Perl, Ruby on Rails, AJAX, HTML, XHTML, CSS, JavaScript, MySQL and any other coding topics.

Advanced Search
5 members in live chat ~  


Reply
 
LinkBack Thread Tools
Old 08-03-2010, 05:46 PM THREAD STARTER               #1 (permalink)
NamePros Regular
Join Date: Sep 2005
Location: Romania
Posts: 496
softgroups will become famous soon enoughsoftgroups will become famous soon enough
 



Extract keywords from domain name


Any ideas how to create a fast script to extract key words from a domain names?
for example personal computers for personalcomputers.(com) ?!

or red loan , for "redloan"

Basically a similar tool to this one:

EstiBot.com - Extract Words from Domains - Keyword Parser
__________________
Free PHP Scripts - Free online php scripts collection
http://www.iseotools.com - SEO Tools
softgroups is offline   Reply With Quote
Old 08-03-2010, 06:37 PM   #2 (permalink)
NamePros Regular
 
baxter's Avatar
Join Date: Apr 2006
Posts: 360
baxter is just really nicebaxter is just really nicebaxter is just really nicebaxter is just really nice
 


Ethan Allen Fund Save The Children
Hey Softgroups,

The way that I accomplished it was the following:

strip the tld
create an array of from the remaining characters str_split
using a dictionary (pspell/enchant classes)
add a letter to the word and test, if it comes back as a word save it
continue on to see if there is a longer word
if no other words exist, remove the letters from the captured word
continue on with the remaining words to see if they makeup words

A lot more went into it but that will give you a start, I don't have the clas any longer because we decided to use a service which offered that as part of their services and it also did language detection etc..
__________________
Canadian Domain Registrar Ready.ca
baxter is offline   Reply With Quote
Old 08-05-2010, 10:27 AM   #3 (permalink)
i love automation
 
xrvel's Avatar
Join Date: Nov 2007
Location: xrvel.com
Posts: 1,615
xrvel has a brilliant futurexrvel has a brilliant futurexrvel has a brilliant futurexrvel has a brilliant futurexrvel has a brilliant futurexrvel has a brilliant futurexrvel has a brilliant futurexrvel has a brilliant futurexrvel has a brilliant futurexrvel has a brilliant futurexrvel has a brilliant future
 





Okay maybe this sounds silly, but i guess you can use google to split the word.
For example if you search for "redloan", google will suggest you to search "red loan" instead.
So i guess you can use some curl code which points to google.
But of course you can get your website banned if you request too "many".

I guess you should use a dictionary then
__________________
xrvel is offline   Reply With Quote
Old 08-05-2010, 02:19 PM   #4 (permalink)
NamePros Regular
 
sourcez's Avatar
Join Date: Nov 2007
Location: UK
Posts: 403
sourcez is a jewel in the roughsourcez is a jewel in the roughsourcez is a jewel in the rough
 



I'd do it similar to baxter, using a dictionary to search the name for keywords.

Give a shout if you're interested in the coding behind this, I could probably help you in PHP if you wanted.
__________________
3cc Internet
sourcez is offline   Reply With Quote
Old 08-05-2010, 03:09 PM   #5 (permalink)
Senior Member
Join Date: Sep 2006
Location: London, UK
Posts: 1,900
Erdy has a reputation beyond reputeErdy has a reputation beyond reputeErdy has a reputation beyond reputeErdy has a reputation beyond reputeErdy has a reputation beyond reputeErdy has a reputation beyond reputeErdy has a reputation beyond reputeErdy has a reputation beyond reputeErdy has a reputation beyond reputeErdy has a reputation beyond reputeErdy has a reputation beyond repute
 



With baxter's method do you need to upload your own wordlists to your database to make the comparison or is there any PHP library that comes with wordlists in various languages?

The reason I'm asking is because baxter mentioned pspell and I'm wondering how that or any other PHP library works without their own wordlist.

what if you need to do these queries in large numbers? For instance I need 130.000 queries per day. Is baxter's method going to handle this?
Erdy is offline   Reply With Quote
Old 08-06-2010, 03:04 AM   #6 (permalink)
NamePros Regular
 
sourcez's Avatar
Join Date: Nov 2007
Location: UK
Posts: 403
sourcez is a jewel in the roughsourcez is a jewel in the roughsourcez is a jewel in the rough
 



I've never used either and think it would probably be easier to use your own wordlist, either in a database or in a file which the script could reference. You can download word lists for free from a number of sites so getting hold of them won't be a problem.

I'm not fully aware of how those two libraries work but Aspell (which is current for non-windows PHP) looks like it has a lot of languages - look here: Supported - GNU Aspell 0.60.6

Because you've got everything on your own server there shouldn't be a problem with scaling this to a huge number of queries, maybe introduce some caching through PHP to save the memory?
__________________
3cc Internet
sourcez is offline   Reply With Quote
Old 08-06-2010, 08:40 AM   #7 (permalink)
Senior Member
Join Date: Sep 2006
Location: London, UK
Posts: 1,900
Erdy has a reputation beyond reputeErdy has a reputation beyond reputeErdy has a reputation beyond reputeErdy has a reputation beyond reputeErdy has a reputation beyond reputeErdy has a reputation beyond reputeErdy has a reputation beyond reputeErdy has a reputation beyond reputeErdy has a reputation beyond reputeErdy has a reputation beyond reputeErdy has a reputation beyond repute
 



Thank you sourcez, especially for the aspell link. I'm not a programmer and I don't understand much of this except it could be useful for my website.
Erdy is offline   Reply With Quote
Old 08-06-2010, 05:52 PM   #8 (permalink)
NamePros Regular
 
baxter's Avatar
Join Date: Apr 2006
Posts: 360
baxter is just really nicebaxter is just really nicebaxter is just really nicebaxter is just really nice
 


Ethan Allen Fund Save The Children
if your using the two classes, you only add words that you want to consider to be words. For example "facebook" you may want to consider it a legitimate word so it doesn't split to face book.

For just doing the keyword extraction you should have absolutely no problem doing the 100,000 per day. Its actually very fast.

I found some of the old prototype code on my machine still, I've supplied it below if it helps.

I'd run it and give you the numbers it can do but I moved to php 5.3 and they removed pspell so your best to convert this code to use enchant.

Cheers,
????: NamePros.com http://www.namepros.com/programming/670398-extract-keywords-from-domain-name.html

Jay

This function was part of a class, so $this->container refers to an array of dictionaries in our case we were using english and french. Hope this helps.

Code:
public function suggest($word) {
      /**
      * setup our suggestion array
      */
      $suggestions = array();
      
      /**
      * init removed array
      */
      $removed = array('end' => '','start' => '');

      /**
      * any numbers at the begininng to remove
      */
      if(true == preg_match('/^(\d+)/',$word,$digits)) {
          /**
          * add to our removed
          */
          $removed['start'] = $digits[1];

          /**
          * remove from the word
          */
          $word = str_replace($removed['start'],'',$word);
      }

      /**
      * any numbers at the end to remove
      */
      if(true == preg_match('/(\d+)$/',$word,$digits)) {
          /**
          * add to our removed
          */
          $removed['end'] = $digits[1];

          /**
          * remove from the word
          */
          $word = str_replace($removed['end'],'',$word);
      }

      /**
      * loop through and look for suggestions
      */
      foreach($this->container as $lang => $int) {
          /**
          * make sure we check for single word
          */
          if(true === pspell_check($int, $word)) {
             /**
             * were correct :D
             */
             $suggestions[$lang] = $removed['start'].' '.$word.' '.$removed['end'];

             /**
             * nothing more to see
             */
             continue;
          }

          /**
          * get the suggestions
          */
          $suggest = pspell_suggest($int, $word);

          /**
          * did we get any suggestions
          */
          if(count($suggest) > 0) {
              /**
              * make sure lowercase
              */
              $suggest[0] = strtolower(trim($suggest[0]));

              /**
              * does it equal our word?
              */
              if($word == str_replace(' ','',$suggest[0])) {
                  /**
                  * were correct :D
                  */
                  $suggestions[$lang] = $removed['start'].' '.$suggest[0].' '.$removed['end'];

                  /**
                  * nothing more to see
                  */
                  continue;
              }

          }
          
          /**
          * init our variables
          */
          $extra = array();
          $found = array();              

          /**
          * split word into an array
          */
          $wordArray = str_split($word);

          /**
          * loop through all possible suggestions
          */
           do{
                  /**
                  * add to the begining of extra
                  */
                  array_unshift($extra,array_pop($wordArray));

                  /**
                  * any suggestions
                  */
                  $suggest = pspell_suggest($int, implode('',$wordArray));

                  /**
                  * did we get one?
                  */
                  if(count($suggest) > 0) {
                      /**
                      * need lowercase
                      */
                      $suggest[0] = strtolower(trim($suggest[0]));

                      /**
                      * add to found array
                      */
                      $found[$suggest[0]] = implode('',$extra);
                  }

                  /**
                  * if no more to process end
                  */
              }while(count($wordArray) > 1);

              /**
              * sort array by key length
              */
              uksort($found, array($this,'sortLength'));

              /**
              * loop through all found
              */
              foreach($found as $string => $extra) {
                  /**
                  * is the extra a word
                  */
                  if(pspell_check($int,$extra)) {
                      /**
                      * add to end of string
                      */
                      $string = $string.' '.$extra;
                  }else{
                      /**
                      * not a word look for suggestions
                      */
                      $suggest = pspell_suggest($int, $extra);
                      
                      /**
                      * did we get any
                      */
                      if(count($suggest) == 0) {                          
                          continue;
                      }

                      /**
                      * add best suggestion to end
                      */
                      $string = $string.' '.strtolower(trim($suggest[0]));
                  }

                  /**
                  * remove spaces
                  */
                  $stringCompare = str_replace(' ','',$string);

                  /**
                  * compare with original word
                  */
                  if($removed['start'].$stringCompare.$removed['end'] == $removed['start'].$word.$removed['end']) {
                      /**
                      * this is our keywords :D
                      */
                      $suggestions[$lang] = $removed['start'].' '.$string.' '.$removed['end'];

                      /**
                      * break the loop
                      */
                      break;
                  }
         }
      }

      if (true === isset($suggestions['en']) AND true === isset($suggestions['fr']))
      {
          $suggestions['bilingual'] = true;
      }

      /**
      * return our suggestions
      */
      return $suggestions;
  }
__________________
Canadian Domain Registrar Ready.ca
baxter is offline   Reply With Quote
Old 08-06-2010, 06:24 PM   #9 (permalink)
Senior Member
Join Date: Sep 2006
Location: London, UK
Posts: 1,900
Erdy has a reputation beyond reputeErdy has a reputation beyond reputeErdy has a reputation beyond reputeErdy has a reputation beyond reputeErdy has a reputation beyond reputeErdy has a reputation beyond reputeErdy has a reputation beyond reputeErdy has a reputation beyond reputeErdy has a reputation beyond reputeErdy has a reputation beyond reputeErdy has a reputation beyond repute
 



Thanks baxter. This could be useful later on.
Erdy is offline   Reply With Quote
Reply


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools


 
All times are GMT -7. The time now is 02:36 PM.

Domain name forum recommended by Domaining.com Powered by: vBulletin® Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.
Search Engine Friendly URLs by vBSEO 3.6.0 Ad Management plugin by RedTyger