NameSilo

Extract keywords from domain name

Spacemail by SpaceshipSpacemail by Spaceship
Watch

softgroups

Established Member
Impact
8
0
•••
The views expressed on this page by users and staff are their own, not those of NamePros.
.US domains.US domains
Hey Softgroups,

The way that I accomplished it was the following:

strip the tld
create an array of from the remaining characters str_split
using a dictionary (pspell/enchant classes)
add a letter to the word and test, if it comes back as a word save it
continue on to see if there is a longer word
if no other words exist, remove the letters from the captured word
continue on with the remaining words to see if they makeup words

A lot more went into it but that will give you a start, I don't have the clas any longer because we decided to use a service which offered that as part of their services and it also did language detection etc..
 
0
•••
Okay maybe this sounds silly, but i guess you can use google to split the word.
For example if you search for "redloan", google will suggest you to search "red loan" instead.
So i guess you can use some curl code which points to google.
But of course you can get your website banned if you request too "many".

I guess you should use a dictionary then ;)
 
0
•••
I'd do it similar to baxter, using a dictionary to search the name for keywords.

Give a shout if you're interested in the coding behind this, I could probably help you in PHP if you wanted.
 
0
•••
With baxter's method do you need to upload your own wordlists to your database to make the comparison or is there any PHP library that comes with wordlists in various languages?

The reason I'm asking is because baxter mentioned pspell and I'm wondering how that or any other PHP library works without their own wordlist.

what if you need to do these queries in large numbers? For instance I need 130.000 queries per day. Is baxter's method going to handle this?
 
0
•••
I've never used either and think it would probably be easier to use your own wordlist, either in a database or in a file which the script could reference. You can download word lists for free from a number of sites so getting hold of them won't be a problem.

I'm not fully aware of how those two libraries work but Aspell (which is current for non-windows PHP) looks like it has a lot of languages - look here: Supported - GNU Aspell 0.60.6

Because you've got everything on your own server there shouldn't be a problem with scaling this to a huge number of queries, maybe introduce some caching through PHP to save the memory?
 
0
•••
Thank you sourcez, especially for the aspell link. I'm not a programmer and I don't understand much of this except it could be useful for my website.
 
0
•••
if your using the two classes, you only add words that you want to consider to be words. For example "facebook" you may want to consider it a legitimate word so it doesn't split to face book.

For just doing the keyword extraction you should have absolutely no problem doing the 100,000 per day. Its actually very fast.

I found some of the old prototype code on my machine still, I've supplied it below if it helps.

I'd run it and give you the numbers it can do but I moved to php 5.3 and they removed pspell so your best to convert this code to use enchant.

Cheers,

Jay

This function was part of a class, so $this->container refers to an array of dictionaries in our case we were using english and french. Hope this helps.

Code:
public function suggest($word) {
      /**
      * setup our suggestion array
      */
      $suggestions = array();
      
      /**
      * init removed array
      */
      $removed = array('end' => '','start' => '');

      /**
      * any numbers at the begininng to remove
      */
      if(true == preg_match('/^(\d+)/',$word,$digits)) {
          /**
          * add to our removed
          */
          $removed['start'] = $digits[1];

          /**
          * remove from the word
          */
          $word = str_replace($removed['start'],'',$word);
      }

      /**
      * any numbers at the end to remove
      */
      if(true == preg_match('/(\d+)$/',$word,$digits)) {
          /**
          * add to our removed
          */
          $removed['end'] = $digits[1];

          /**
          * remove from the word
          */
          $word = str_replace($removed['end'],'',$word);
      }

      /**
      * loop through and look for suggestions
      */
      foreach($this->container as $lang => $int) {
          /**
          * make sure we check for single word
          */
          if(true === pspell_check($int, $word)) {
             /**
             * were correct :D
             */
             $suggestions[$lang] = $removed['start'].' '.$word.' '.$removed['end'];

             /**
             * nothing more to see
             */
             continue;
          }

          /**
          * get the suggestions
          */
          $suggest = pspell_suggest($int, $word);

          /**
          * did we get any suggestions
          */
          if(count($suggest) > 0) {
              /**
              * make sure lowercase
              */
              $suggest[0] = strtolower(trim($suggest[0]));

              /**
              * does it equal our word?
              */
              if($word == str_replace(' ','',$suggest[0])) {
                  /**
                  * were correct :D
                  */
                  $suggestions[$lang] = $removed['start'].' '.$suggest[0].' '.$removed['end'];

                  /**
                  * nothing more to see
                  */
                  continue;
              }

          }
          
          /**
          * init our variables
          */
          $extra = array();
          $found = array();              

          /**
          * split word into an array
          */
          $wordArray = str_split($word);

          /**
          * loop through all possible suggestions
          */
           do{
                  /**
                  * add to the begining of extra
                  */
                  array_unshift($extra,array_pop($wordArray));

                  /**
                  * any suggestions
                  */
                  $suggest = pspell_suggest($int, implode('',$wordArray));

                  /**
                  * did we get one?
                  */
                  if(count($suggest) > 0) {
                      /**
                      * need lowercase
                      */
                      $suggest[0] = strtolower(trim($suggest[0]));

                      /**
                      * add to found array
                      */
                      $found[$suggest[0]] = implode('',$extra);
                  }

                  /**
                  * if no more to process end
                  */
              }while(count($wordArray) > 1);

              /**
              * sort array by key length
              */
              uksort($found, array($this,'sortLength'));

              /**
              * loop through all found
              */
              foreach($found as $string => $extra) {
                  /**
                  * is the extra a word
                  */
                  if(pspell_check($int,$extra)) {
                      /**
                      * add to end of string
                      */
                      $string = $string.' '.$extra;
                  }else{
                      /**
                      * not a word look for suggestions
                      */
                      $suggest = pspell_suggest($int, $extra);
                      
                      /**
                      * did we get any
                      */
                      if(count($suggest) == 0) {                          
                          continue;
                      }

                      /**
                      * add best suggestion to end
                      */
                      $string = $string.' '.strtolower(trim($suggest[0]));
                  }

                  /**
                  * remove spaces
                  */
                  $stringCompare = str_replace(' ','',$string);

                  /**
                  * compare with original word
                  */
                  if($removed['start'].$stringCompare.$removed['end'] == $removed['start'].$word.$removed['end']) {
                      /**
                      * this is our keywords :D
                      */
                      $suggestions[$lang] = $removed['start'].' '.$string.' '.$removed['end'];

                      /**
                      * break the loop
                      */
                      break;
                  }
         }
      }

      if (true === isset($suggestions['en']) AND true === isset($suggestions['fr']))
      {
          $suggestions['bilingual'] = true;
      }

      /**
      * return our suggestions
      */
      return $suggestions;
  }
 
0
•••
Thanks baxter. This could be useful later on.
 
0
•••

We're social

Unstoppable Domains
Domain Recover
DomainEasy — Zero Commission
  • The sidebar remains visible by scrolling at a speed relative to the page’s height.
Back