Thursday, February 14, 2013

PHP Tutorial: Making a webcrawler!


Don't you know what a webcrawler is? A webcrawler is used by search engines like:
Google, Yahoo and bing. These search engines got bots running 24/7 searching for new websites. How do these bots work? Easy.
Keep reading to find out how to make one yourself!

Let me show you in steps:

Step 1: Bot starts and gets a URL.
Step 2: The bot opens the URL and searches for all links.
Step 3: The bot delete not working links.
Step 4: The bot adds the links to a database.
Step 5: The bot goes back to Step 1 with the a found link.

Lets get started making one shall we?

  • Brains
  • Some php knowledge (Variables, Functions etc.)
  • Some HTML knowledge (How to make a link.)
  • A webserver (See below)
  • A MySQL Database (comes with the webserver below)
  • "Simple_HTML_Dom.PHP" (Download: HERE)

  • Setting up a webserver
  • Concept
  • Making the PHP crawler.
  • Variations.

Setting up a webserver:
You wanna know how to setup a webserver?
It's super easy! But do you want it on your pc? or on an USB stick?

For the PC download: WAMP WebServer.
Setting it up wont be that hard. Just follow the install instructions and start it.
Now goto: http://localhost/ 
(NO .com .org .net)
Put your ".php" files in: {wamp instal directory}/www/

For USB download: EasyPHP - The portable webserver!
The setup is the same as wamp but make sure your install location is on your USB.

NOTE: Both webserver got MySQL preinstalled!
To connect use:

mysql_connect("localhost", "root", "");

Server: localhost
Username: Root
Password: (none)

The concept is very easy. Like said before the bots runs in a loop.
And adds the links to the database. 
Now how are we going to get the HTML source code?
The "Simple_HTML_Dom.php" has all the functions we need!
So let's include it in our html first:


Ok! We've included the extension we can now use it!
If we want the source code from a url we need to define the url and load it first:

  $url = "";
  $html = new simple_html_dom();
Ok let me explain it:
  1. Opening the PHP file: "<?php"
  2. Comment (non code)
  3. Including the extension
  4. Comment (non code)
  5. Defining variable "$url". This is the page that we will grab the source from!
  6. Defining variable "$html". This is a extension class. (read on)
  7. Execute function "load_file" in the "$html" class. This will load "$url" source!
  8. Comment (non code)
  9. Closing the PHP file: "?>"
We're here. We've succesfully loaded the file in the "$html" variable!.

Making the base PHP file:
We've already made a good base but we want to extend it, So it will echo out the links. 
So we've got the source in "$html", We now need to find all the "<a href="blabla"></a>" tags and cut out the href link.
We're lucky cause "Simple_html_dom.php" already got such a function and looks like this:


Ok, This function will return an array with all the "<A>" tags in the source!
To get through all the "<A>" tags quickly we're gonna use "foreach(){}" function.
And I'm gonna use the code from Concept:

  $url = "";
  $html = new simple_html_dom();
  foreach($html->find("a") as $link)
    echo $link->href."< br />;
Now let me explain it:

    9. The foreach will loop and assign a array entry to "$link" till there are no more left.
         So it will start at 0,1,2,3,4,5,6,7,8,9 in the array.
   11. This echos out the href from the "<A>" tag and adds an enter.

Now change "$url" to a site and watch the magic happens.
This is my output: (I have changed the urls a bit for protection!)

Nice! We've got results.
So what've you learned ?:
  • How to incude extensions.
  • How to use extensions.
  • How to get source code.
  • How to use "foreach(){}"
  • How to crawl the web!
Now if you want to make an: 'infinite crawler' just apply your basic php skills and you'll be able to make a loop.
Again goto Concept to see what you have to do for an 'infinite crawler'.
You can put all the urls found on an website in an array or directly into a database.
Then use those urls and crawl them.

Ofcourse you can variate much in crawlers. I made one which will show you the found links on a site. You can press these links and it will crawl the pressed link. 
It's like an 'infinite crawler' but then with human pauses in between.

Download: DropBox link to: Crawler_source_code.rar

For the ones that don't trust me:
Jotti - Online virus scanner

Jotti is online virus scanner. It will scan a file with 21 different virus scanners.
I've already uploaded the file on jotti so you can view the results above.

Thank you!
Thanks for reading this post. If you wish to get more tutorials like these subscribe to this blog on the right site. Just enter your e-mail and you will get all the post right to your mail!

Greets, Tim.


  1. Good overview, Lee. Using a tool such as can be helpful in assessment and identifying areas of need.

    Website Development company

    1. Thank you for your response. I'm happy this was usefull!

  2. You will discover some fascinating points in time in this post but I don’t know if I see all of them interior to heart. I am learning great extra challenging on distinct blogs everyday. Lots of people will be benefited from your writing. Cheers!

    Press Release Writers
    Press Release Writing Service

  3. Hi Tim. Wonderful tutorial. I added to my list of PHP-based web crawler tutorials. Thanks for the great resource!

  4. It is a pleasure going through your post. I have bookmarked you to check out new stuff from your training in jalandhar

  5. hi, can simple_html_dom or PHPCrawl crawl a dynamic ajax or javascript content? if it is can, can you show me how to do that? I tried to combine this two methods and works for several websites, but when I tried to this two dynamic websites and I can’t load the value I want.
    The value exist when I inspect the element, but when I view page source, the value is not in there

  6. Your blog has given me that thing which I never expect to get from all over the websites. Nice post guys!

    Melbourne Web Designer

  7. Thanks for great post. Very nice information it's very useful for everyone. Keep posting. best php training in pune

  8. Hello could you make a video tutorial please

  9. thank you for sharing this informative blog.. this blog really helpful for everyone.. explanation are clear so easy to understand... I got more useful information from this blog

    php training | php training in chennai | best php training | best php training in chennai

  10. It is really a great work and the way you sharing the knowledge is excellent.
    As a beginner in PHP your post is very help full. Thanks for your informative article. If you guys interested to learn PHP join Hire PHP developer in India

  11. Nice info about Php it’s reallyhelpful…. If it possible share some more tutorials……….

  12. waoo nice post about "PHP Tutorial: Making a webcrawler!"


    Silver Jackpot Call

  13. Australia Best Tutor is one of the best Online Assignment Help providers at an affordable price. Here All Learners or Students are getting best quality assignment help with reference and styles formatting.

    Visit us for more Information

    Australia Best Tutor
    Sydney, NSW, Australia
    Call @ +61-730-407-305
    Live Chat @

    Our Services

    Online assignment help Australia
    my assignment help Australia
    assignment help
    help with assignment
    Online instant assignment help
    Online Assignment help Services

  14. I read this article. I think You put a lot of effort to create this article. I appreciate your work.
    thesis Writing Service

  15. This is an awesome post.Really very informative and creative contents. These concept is a good way to enhance the knowledge.I like it and help me to development very well.Thank you for this brief explanation and very nice information.Well, got a good knowledge.

    rpa Training in tambaram

    blueprism Training in tambaram

    automation anywhere training in tambaram

    iot Training in tambaram

    rpa training in sholinganallur

    blue prism training in sholinganallur

    automation anywhere training in sholinganallur

    iot training in sholinganallur

  16. Wonderful article, very useful and well explanation. Your post is extremely incredible.

  17. This is a 2 good post. This post gives truly quality information.

    RPA Training in Hyderabad

  18. very useful and well explained. Your post is extremely incredible.

    RPA Training in Hyderabad

  19. hank you for benefiting from time to focus on this kind of, I feel firmly about it and also really like comprehending far more with this particular subject matter. In case doable, when you get know-how, is it possible to thoughts modernizing your site together with far more details? It’s extremely useful to me 

    java training in tambaram | java training in velachery

    java training in omr | oracle training in chennai

    java training in annanagar | java training in chennai

  20. You blog post is just completely quality and informative. Many new facts and information which I have not heard about before. Keep sharing more blog posts.
    python training in pune
    python online training
    python training in OMR

  21. Good Post! Thank you so much for sharing this pretty post, it was so good to read and useful to improve my knowledge as updated one, keep blogging.
    Devops training in velachery
    Devops training in annanagar
    Devops training in sholinganallur

  22. Your blog is very useful for me, Thanks for your sharing.

    MSBI Training in Hyderabad

  23. Thanks for the good words! Really appreciated. Great post. I’ve been commenting a lot on a few blogs recently, but I hadn’t thought about my approach until you brought it up. 
    Blueprism training institute in Chennai

    Blueprism online training

    Blue Prism Training Course in Pune

    Blue Prism Training Institute in Bangalore

  24. Excellant post!!!. The strategy you have posted on this technology helped me to get into the next level and had lot of information in it.

    angularjs Training in bangalore

    angularjs Training in btm

    angularjs Training in electronic-city

    angularjs online Training

    angularjs Training in marathahalli

  25. You rock particularly for the high caliber and results-arranged offer assistance. I won't reconsider to embrace your blog entry to anyone who needs and needs bolster about this region.
    safety course in chennai

  26. When cooking with oil, you will see the fact that smoke usually receives emitted in case you often uses the identical oil. Typically, these form of eating places have today's hoods as well as exhaust fans.
    Visit here
    Kitchen Chimney Repair Service in Noida
    Kitchen Kitchen Chimney Repair Service in Vaishali
    Kitchen Kitchen Chimney Repair Service in indirapuram
    Kitchen Kitchen Chimney Repair Service in vasundhra
    Kitchen Kitchen Chimney Repair Service in faridabad


  27. When I initially commented, I clicked the “Notify me when new comments are added” checkbox and now each time a comment is added I get several emails with the same comment. Is there any way you can remove people from that service? Thanks.

    AWS Training in Bangalore | Amazon Web Services Training in Bangalore

    Amazon Web Services Training in Pune | Best AWS Training in Pune

    AWS Online Training | Online AWS Certification Course - Gangboard

  28. I’m thoroughly enjoying your blog. I as well as an aspiring blog writer, but I’m still new to the whole thing. Do you have any recommendations for newbie blog writers? I’d appreciate it.
    safety course in chennai

  29. Data Science Certification Courses in Bangalore. ExcelR is the Best Data Science Training Institute in Bangalore with Placement assistance and offers a blended.

  30. Hey Nice Blog!! Thanks For Sharing!!!Wonderful blog & good post.Its really helpful for me, waiting for a more new post. Keep Blogging!
    best java training in coimbatore
    php training in coimbatore
    best php training institutes in coimbatore

  31. Well somehow I got to read lots of articles on your blog. It’s amazing how interesting it is for me to visit you very often.
    Microsoft Azure online training
    Selenium online training
    Java online training
    Python online training
    uipath online training

  32. Thanks For Sharing The Information The information Shared Is Very valuable Please keep updating us Time Just Went On reading The article Python Online Course AWS Online Course Devops Online Course DataScience Online Course

  33. It is amazing and wonderful to visit your site.Thanks for sharing this information,this is useful to me...

    Article submission sites
    Guest posting sites

  34. Hello, I read your blog occasionally, and I own a similar one, and I was just wondering if you get a lot of spam remarks? If so how do you stop it, any plugin or anything you can advise? I get so much lately it’s driving me insane, so any assistance is very much appreciated.
    Android Training in Chennai | Best Android Training in Chennai
    Matlab Training in Chennai | Best Matlab Training in Chennai
    Best AWS Training in Chennai | AWS Training in Chennai
    Selenium Training in Chennai | Best Selenium Training in chennai
    Devops Course Training in Chennai | Best Devops Training in Chennai

  35. Hello, I read your blog occasionally, and I own a similar one, and I was just wondering if you get a lot of spam remarks? If so how do you stop it, any plugin or anything you can advise? I get so much lately it’s driving me insane, so any assistance is very much appreciated.
    AWS Training in Chennai | Best AWS Training in Chennai
    Best Data Science Training in Chennai
    Best Python Training in Chennai
    Best RPA Training in Chennai
    Digital Marketing Training in Chennai
    Matlab Training in Chennai

  36. Wonderful Tutorial, It’s very informative and you are obviously very knowledgeable in this field. Very solid content.

    ExcelR Data Science

  37. It should be noted that whilst ordering papers for sale at paper writing service, you can get unkind attitude. In case you feel that the bureau is trying to cheat you, don't buy term paper from it.
    data science courses training
    data analytics certification courses in Bangalore
    ExcelR Data science courses in Bangalore


  38. wow, great, I was wondering how to cure acne naturally. and found your site by google, learned a lot, now i’m a bit clear. I’ve bookmark your site and also add rss. keep us updated.


  39. I just got to this amazing site not long ago. I was actually captured with the piece of resources you have got here. Big thumbs up for making such wonderful blog page!data science course in dubai

  40. Really I Appreciate The Effort You Made To Share The Knowledge. This Is Really A Great Stuff For Sharing. Keep It Up . Thanks ForQuality posts is the crucial to invite the visitors to visit the web page, that's what this web page is providing. data science course in singapore

  41. I just got to this amazing site not long ago. I was actually captured with the piece of resources you have got here. Big thumbs up for making such wonderful blog page!data science course in dubai

  42. I finally found great post here.I will get back here. I just added your blog to my bookmark sites. thanks.Quality posts is the crucial to invite the visitors to visit the web page, that's what this web page is providing.
    top 7 best washing machine


  43. Really appreciate this wonderful post that you have provided for us.Great site and a great topic as well i really get amazed to read this. Its really good.
    How to Start A blog 2019
    Eid AL ADHA

  44. Just saying thanks will not just be sufficient, for the fantasti c lucidity in your writing. I will instantly grab Python training in pune your rss feed to stay informed of any updates.

  45. Nice information, valuable and excellent design, as share good stuff with good ideas and concepts, lots of great information and inspiration, both of which I need, thanks to offer such a helpful information here.
    big data course malaysia

  46. I was just browsing through the internet looking for some information and came across your blog. I am impressed by the information that you have on this blog. It shows how well you understand this subject. Bookmarked this page, will come back for more.Data Science Courses

  47. Great Article… I love to read your articles because your writing style is too good, its is very very helpful for all of us
    You will get an introduction to the Python programming language and understand the importance of it. How to download and work with Python along with all the basics of Anaconda will be taught. You will also get a clear idea of downloading the various Python libraries and how to use them.
    About ExcelR Solutions and Innodatatics
    Do's and Don’ts as a participant
    Introduction to Python
    Installation of Anaconda Python
    Difference between Python2 and Python3
    Python Environment
    Exception Handling (Error Handling)
    Excelr Solutions

  48. Such a great and informative article.
    You just made my day thanks for sharing this article.

    data science course singapore is the best data science course