[advanced search]
 

Go Back   NamePros.com > Discussion > Web Design & Development > Programming

Programming PHP, Perl, Ruby on Rails, AJAX, HTML, XHTML, CSS, JavaScript, MySQL and any other coding topics.


Closed Thread
 
LinkBack Thread Tools
Old 02-02-2006, 06:03 PM   #1 (permalink)
DNOA Member
 
mholt's Avatar
 
Join Date: May 2004
Location: Utah
Posts: 5,041
18.01 NP$ (Donate)

mholt has a brilliant futuremholt has a brilliant futuremholt has a brilliant futuremholt has a brilliant futuremholt has a brilliant futuremholt has a brilliant futuremholt has a brilliant futuremholt has a brilliant futuremholt has a brilliant futuremholt has a brilliant futuremholt has a brilliant future

Autism Marrow Donor Program 9/11/01 :: Never Forget Multiple Sclerosis Adoption Alzheimer's Lou Gehrig's Disease (ALS)
Question Retrieving Alexa Ranking Programmatically?

Hey all,

I'm having a hard time trying to figure out how to get an Alexa ranking with PHP.

In the past I've been screen scraping but uh... lol they added new (and unfortunately) randomized HTML/XML tags between each digit in the ranking, so I can't grab it from scraping anymore.

I signed up with an Amazon Web Services (AWS) at http://aws.amazon.com but am still a bit confused...

Does anybody know how to do this?

Thanks,
-Matt
__________________
codeboards

A high-quality community of programmers -- Join today and post! We want new members!
mholt is offline  
Old 02-02-2006, 06:24 PM   #2 (permalink)
tgo
NamePros Regular
 
Join Date: Aug 2005
Posts: 317
103.75 NP$ (Donate)

tgo is on a distinguished road


Can I ask why your screen scraping is not working anymore? I have my own screen scraper script in asp that is working just fine by reading the actual alexa page and getting the values. They have comment code just before the numbers you have to read, just look for that.

http://seohelp.info/alexarankings.asp

I haven't extensively tested this script with many domains, so I don't know if they all work, but from what I can tell its still possible to get the data.
tgo is offline  
Old 02-02-2006, 06:34 PM   #3 (permalink)
DNOA Member
 
mholt's Avatar
 
Join Date: May 2004
Location: Utah
Posts: 5,041
18.01 NP$ (Donate)

mholt has a brilliant futuremholt has a brilliant futuremholt has a brilliant futuremholt has a brilliant futuremholt has a brilliant futuremholt has a brilliant futuremholt has a brilliant futuremholt has a brilliant futuremholt has a brilliant futuremholt has a brilliant futuremholt has a brilliant future

Autism Marrow Donor Program 9/11/01 :: Never Forget Multiple Sclerosis Adoption Alzheimer's Lou Gehrig's Disease (ALS)
Look at the source code, they've placed strange tags between the digits of the rankings. I always used to get the rankings properly but now when I go and try it, they come up "0" and I haven't changed anything.
__________________
codeboards

A high-quality community of programmers -- Join today and post! We want new members!
mholt is offline  
Old 02-02-2006, 06:43 PM   #4 (permalink)
tgo
NamePros Regular
 
Join Date: Aug 2005
Posts: 317
103.75 NP$ (Donate)

tgo is on a distinguished road


Heh I never noticed this change, my code seams to have been immune to their code change. My page puts all those useless code parts in my page too, but since they do nothing they dont display. I grab all the data between The TD tags.

I do a search on the page for "Alexa Web Information Service.-->"

When found, I grab everything between the following TD tags, which inclues all the useless code. Since the useless code doesnt display in a browser, It just displays the numbers.
tgo is offline  
Old 02-02-2006, 07:11 PM   #5 (permalink)
DNOA Member
 
mholt's Avatar
 
Join Date: May 2004
Location: Utah
Posts: 5,041
18.01 NP$ (Donate)

mholt has a brilliant futuremholt has a brilliant futuremholt has a brilliant futuremholt has a brilliant futuremholt has a brilliant futuremholt has a brilliant futuremholt has a brilliant futuremholt has a brilliant futuremholt has a brilliant futuremholt has a brilliant futuremholt has a brilliant future

Autism Marrow Donor Program 9/11/01 :: Never Forget Multiple Sclerosis Adoption Alzheimer's Lou Gehrig's Disease (ALS)
Ah... maybe I should remove the number formatting functions (custom ones, I guess I'll live w/o them)... I don't want excess code in the source either but I guess these random tags won't matter...

Well, thanks, I'll see what happens tomorrow when I work on it.
__________________
codeboards

A high-quality community of programmers -- Join today and post! We want new members!
mholt is offline  
Old 02-02-2006, 07:24 PM   #6 (permalink)
tgo
NamePros Regular
 
Join Date: Aug 2005
Posts: 317
103.75 NP$ (Donate)

tgo is on a distinguished road


With this now brought to my attention I think by just converting the text gathered and converting it to numeric only values may eliminate the extra code. I will test that later and see if I can rid myself of the crazy code.

Or just code in a remove all tags function to remove any and all tags from the retrieved data between the TD tags. Should not be hard, its just looking for "<" and ">" in the extracted text and removing all text in between.
tgo is offline  
Old 02-02-2006, 07:45 PM   #7 (permalink)
DNOA Member
 
mholt's Avatar
 
Join Date: May 2004
Location: Utah
Posts: 5,041
18.01 NP$ (Donate)

mholt has a brilliant futuremholt has a brilliant futuremholt has a brilliant futuremholt has a brilliant futuremholt has a brilliant futuremholt has a brilliant futuremholt has a brilliant futuremholt has a brilliant futuremholt has a brilliant futuremholt has a brilliant futuremholt has a brilliant future

Autism Marrow Donor Program 9/11/01 :: Never Forget Multiple Sclerosis Adoption Alzheimer's Lou Gehrig's Disease (ALS)
Yeah, that's hard, but there's numbers inside the < and > characters... but yeah I should try a strip_tags function and see if that helps: But they aren't standard HTML tags... hmm..

I'll figure this out sooner or later
__________________
codeboards

A high-quality community of programmers -- Join today and post! We want new members!
mholt is offline  
Old 02-02-2006, 07:55 PM   #8 (permalink)
tgo
NamePros Regular
 
Join Date: Aug 2005
Posts: 317
103.75 NP$ (Donate)

tgo is on a distinguished road


Gota think outside the box. :P

You are thinking of looking for the tags specificaly. This is not how to make a strip tags function.

First you look for a "<" then find the next time you see ">" and delete, then do it again, until you dont find anymore "<"

All the weird tags start with a "<" and end with a ">" so if you know you dont want anything in between these two things its fairly easy to remove the elements in a small loop.

I dont know PHP but in asp this is easy as I just grab the info between the TD tags, put it in a temp variable, then remove all tags function that returns the value without any tags.
tgo is offline  
Old 02-02-2006, 08:03 PM   #9 (permalink)
DNOA Member
 
mholt's Avatar
 
Join Date: May 2004
Location: Utah
Posts: 5,041
18.01 NP$ (Donate)

mholt has a brilliant futuremholt has a brilliant futuremholt has a brilliant futuremholt has a brilliant futuremholt has a brilliant futuremholt has a brilliant futuremholt has a brilliant futuremholt has a brilliant futuremholt has a brilliant futuremholt has a brilliant futuremholt has a brilliant future

Autism Marrow Donor Program 9/11/01 :: Never Forget Multiple Sclerosis Adoption Alzheimer's Lou Gehrig's Disease (ALS)
It's similar in PHP... thanks
__________________
codeboards

A high-quality community of programmers -- Join today and post! We want new members!
mholt is offline  
Old 02-14-2006, 03:27 PM   #10 (permalink)
col
NamePros Regular
 
col's Avatar
 
Join Date: Jan 2005
Location: Land of the m00
Posts: 723
140.10 NP$ (Donate)

col is just really nicecol is just really nicecol is just really nicecol is just really nice


Just fetch the number like mentioned above, with all the mumbojumbo tags included. Then just use strip_tags($string_with_mumbojumbo_tags) to clean it up.
I guess you've already found out a way to fix this, but I just wanted to put up a solution at the end of the thread if someone, like I just did, tries to find the answer here in the forum
__________________
The more I think
the more confused I get...
col is online now  
Closed Thread


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
Deleted Domain List Site FREE dropscripts For Sale / Advertising Board 12 09-24-2003 07:16 AM

Site Sponsors
Advertise your business at NamePros

All times are GMT -7. The time now is 09:16 AM.


Powered by: vBulletin® Copyright ©2000 - 2009, Jelsoft Enterprises Ltd.
Search Engine Friendly URLs by vBSEO 3.3.0
Template-Modifications by TMS
vBCredits v1.4 Copyright ©2007 - 2008, PixelFX Studios

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85