Google Clone Script – A Practical Guide

Google Clone Script - A Practical GuideMany of us, may have tried to build a search engine based on search engine scripts or google clone scripts. This article intent to give you on useful information on working with a google clone script. To build a google clone, you need to first choose the best google clone script.  You also need to use a good search engine theme. Your hosting company preferences can also be very important based on the google clone script you have chosen. Below I will try to give you a practical idea on various components for launching your own search engine service online.

Choosing the Right Type of  Google Clone Script

Google clone scripts are of two kinds

Meta Search Engine Scripts

It will use the results DBs of third party search engines to present the results in a friendly way. It will not do the much complicated world wide web search.

Search Engines like Inout Search Engine , K Search are popular meta search engine scripts. Inout Search Engine works on Legal API keys of major search engines which enables you to run a completely legal mea search engine service.

Search Engine Scripts with Integrated Crawler/Bots

Another type of search engine/ google clone scripts, are designed to build more like big search engines which includes the crawling/search bot logic also along with the result display logic. These kind of search engines are powerful, but expensive in nature. You need to have some kind of architecture understanding also.

Sphider and Inout Spider are search engines of this nature. Sphider is an old system that works on PHP an MySQL, where Inout Spider is developed on a powerful search engine architecture that can support distributed computing and distributed data handling. Inout Spider is widely regarded as the best complete search engine script currently available built in the architecture of Google/Bing.

You may also have a look on another article written by me, on this subject. Google Clone – Technology and Architecture Guide, gives you a detailed idea on how to build a reliable search engine service.

Choosing the Right Theme for your Google Clone Script

Just buying the software need not be enough! Sometimes if you do not want to use the default look of the software, you may order a theme/template. You can get your own designer to get your search engine script designed.

Deploying in the Right Server

Depending on which search engine/ google clone script type you have chosen you may need to buy the right hosting. For example, most of the meta search engines like K Search, Inout Search Engine will work on a standard shared hosting environment, crawler engines like Inout Spider, may require much more powerful machines.

Softwares like Inout Spider are designed to work on much powerful distributed environment, and it is always recommended that you choose a hosting company that can provide you nodes in a single network in future if necessary. It can help you to scale you data storage and computation capacity of your search engine as you need it.

 

Disclaimer
By Google clone, I do not mean an exact google clone, The term Google is used as a synonym for ‘search engine’. This article is indented to help you create a standard search engine like Google, Bing, Yahoo, Baidu etc.





Read More

Google Clone – Technology and Architecture Guide

A Guide to Build Your Own Google Clone Search EngineHave you ever thought about building a fully featured search engine working similar to Google or Bing? Google has emerged as one of the biggest companies on Internet within a very short span of time. All internet entrepreneurs might have amused by seeing the success of Google as a Company. Thinking about the Technology, how google is working so fast and powerful? How does google manage the fault tolerance? Where do google save all these data of billions of web pages? Can you create a search engine like Google? If so how?

Well, thinking about building a search engine like google, you need to know various aspects. First of all building a search engine like google cannot be done overnight. It takes months or even years to crawl and store all the data, and to rank the results, to make it crawl almost the entire web. But usually you should be able to start producing the search results within a couple of week.

Where do you store the data? Where do Google stores the data? Google has a unique NOSQL database called BigTable where they store the entire search data. BigTable works on a distributed system which works on much reliable HDFS system. This file system supports distributed computing to support thousands of notes attached in the network.

What Technology should I use?

You cannot run google on MySQL. Period. Not even in Oracle, if you are looking for a global scale service. You need to have something similar to BigTable which works on a file system like HDFS. But HDFS and BigTable are google specific technologies and are not open source and not available to the public.

Hadoop : Hadoop is a filesystem which works very similar to HDFS, and it is widely regarded as the BEST distributed filesystem available now. Hadoop is open source continuously researched and developed by Apache! Hadoop is the best file system you can use to run a highly scalable, multimachine applications like search engines, analytics etc.Hadoop help you to connects thousands of nodes together to work as a expandable file system.
http://hadoop.apache.org/ 

HBase: Hbase is a database that works on NOSQL (Not Only SQL) system, which can work on top of Hadoop to store petabytes of data. Though it based on Java and regarded as a reliable database. Hadoop is maintained by Apache!
http://hbase.apache.org/

Hypertable: Hypertable is another NOSQL database which works on Hadoop. It works based on C++ and the Hypertable company claims that the performance is much faster the HBase. Hypertable support is also very good and it has more flexibility on queries comparing with HBase.
http://hypertable.com/

So for running a Google clone, you shall either use Hadoop + HBase or Hadoop + Hypertable.

What Hardware Should I use?

Of course I understand that you don’t want to start with your own datacenter initially. Google has their own, ever expanding datacenter around the world. The ideal solution to start would be you tie up with a datacenter or hosting company who can provide a series of nodes(computers) in a single network. The key reason, why need nodes in a single network is that, as we expand more nodes in future in a scalable distributed system, nodes in same physical network can significantly improve the performance of your search engine.

How Can I Code a Google Clone Application?

Here comes the most tricky and interesting part on your journey to build a Google clone search engine. No matter your decide to use the right technology or to use the right infrastructure, if the code is not powerful, and designed to manage the scalability, your spider won’t be effective enough. I am not able to cover your the components of your software logic, algorithm to build up a spider. Anyway the below diagram found on Inout Spider  will give you a read good idea about the major components required to build a spider. Inout Spider is a commercial application (widely regarded as a powerful search engine data spider application, and a standard google clone script) which work on Hadoop and Hypertable technologies. So if you cannot code it yourself, I recommend you consider Inout Spider.

 

Google Clone Algorithm of Inout Spider

(Source: http://www.inoutscripts.com/products/inout_spider/)

Summary

Building a search engine like google, is never as easy task, or else we would have seen much google clones online. But with the right technology, hardware and software(your own, or commercial applications like Inout Spider), your dream is achievable.

 

Disclaimer

By Google clone, I do not mean an exact google clone, The term Google is used as a synonym for ‘search engine’. This article is indented to help you create a standard search engine like Google, Bing, Yahoo, Baidu etc.

 

Read More