Ship Simulator
English forum => Small talk => Topic started by: IRI5HJ4CK on March 22, 2009, 14:03:29
-
Hi guys,
I came accross this new Search Engine, its called "Cuil", I find it to be very good!
http://www.cuil.com/
Its a little more "Flash" than Google, with the results, only just tried it out, so It may not be as good results/info wise. But, I have to say, so far, it seems on par with Google, as a search engine, only thing is, it dosen't seem to search images, but there again, I'm not the technical person :lol:
I also found out something rather interesting-It is developed by former employees of Google, they also claim that they have the largest Search Index.
Try it out if you like, thought I'd let you guys know about it!-If you don't already know!
Jack :)
p.s. another thing that you may find interesting is the fact that "Cuil" means Knowledge, in Irish.
-
Cuil was built to "kill" google. ;)
Plus it's got bad rep towards webmasters for requesting non-existant pages and using much bandwidth.
Not to be all negative, but I did ask them to stop indexing my site as it uses up alot of bandwidth, they never stopped and still haven't and they haven't got the courtesy to even reply back to me. :-\
Sorry, but I perfer Google, it's well known and trustworthy, well more trustworthy than Cuil anyway.
-
Oh dear, that dosen't sound good :-[ :-\
Should I delete the topic?
Jack.
-
Cuil was built to "kill" google. ;)
Plus it's got bad rep towards webmasters for requesting non-existant pages and using much bandwidth.
Not to be all negative, but I did ask them to stop indexing my site as it uses up alot of bandwidth, they never stopped and still haven't and they haven't got the courtesy to even reply back to me. :-\
Sorry, but I perfer Google, it's well known and trustworthy, well more trustworthy than Cuil anyway.
Did you put a Cuil-specific tag in the robots file?
-
I know that the owners of such search engines are watching what we are searching so I always Stick It To The Man by searching Google, Yahoo, Dogpile...etc... ;D
-
Did you put a Cuil-specific tag in the robots file?
Tried that but they somehow got around that.
-
Tried that but they somehow got around that.
Hmmm. I put "robots.txt cuil" into Google and almost drowned in the complaints. Lots of adverse publicity about this one. e.g. "If don't want to find what you're after - Use Cuil". And loads of complaints from Webmasters about their crawler.
But, I found this page on Cuil which might be of interest. They are using Twiceler to crawl over your pages. They reckon they have a blocking mechanism, so follow the links on the page.
http://www.cuil.com/info/webmaster_info/
-
It looks nicer than Google.
-
It looks nicer than Google.
Looks can be deceiving... :evil:
-
Looks can be deceiving... :evil:
Dun dun dun...
;D
-
Hmmm. I put "robots.txt cuil" into Google and almost drowned in the complaints. Lots of adverse publicity about this one. e.g. "If don't want to find what you're after - Use Cuil". And loads of complaints from Webmasters about their crawler.
But, I found this page on Cuil which might be of interest. They are using Twiceler to crawl over your pages. They reckon they have a blocking mechanism, so follow the links on the page.
http://www.cuil.com/info/webmaster_info/
I looked at that also Terry.
They also said that if you emailed them requesting for them to stop crawlling/indexing your site and then they will put your site on the "not to crawll" list. But as I said, tried that also and failed.. :-\
And they also have a crawller named "cuil", seen it on a list of search engines that has crawlled elitefun.
-
And it crawls every 2 hours, really annoying...
-
If you have the facility, you can perhaps block their IP addresses. They do show them on that page I linked, above. (I assume that they actually know their own IP addresses, of course).
-
If you have the facility, you can perhaps block their IP addresses. They do show them on that page I linked, above. (I assume that they actually know their own IP addresses, of course).
Wouldn't the search results for the site then read the error page in the description and title? Would rather just have the thing be gone.
I have no clue if James hosting has cpanel (or equivalent)
-
Wouldn't the search results for the site then read the error page in the description and title? Would rather just have the thing be gone.
They shouldn't. Otherwise all search engines would consist mainly of "404" pages.
-
Um...
http://www.cuil.com/search?q=404+Page+not+found
-
I think they must be somewhat "young" in their approach...
-
I'll try that Terry. Thanks!
-
Here is the robots.txt i use for the site I manage. blocks everything but google yahoo and internet archive.
google & yahoo goes through about 2 times a week. internet archive has not gone through yet. the rest do not get through.
User-agent: Google
Disallow: /addon-modules/
Disallow: /activities/
Disallow: /dist11/
Disallow: /images/
Disallow: /activities/
User-agent: Yahoo
Disallow: /addon-modules/
Disallow: /activities/
Disallow: /dist11/
Disallow: /images/
Disallow: /activities/
User-agent: ia_archiver
Disallow: /addon-modules/
Disallow: /activities/
Disallow: /dist11/
Disallow: /images/
Disallow: /activities/
User-agent: *
Disallow: /
-
That works with all "mature" search engines. But if someone writes a crawler that doesn't check the file, or that chooses to ignore the instructions, then there is nothing to force it to do that, except an IP address ban.
-
That works with all "mature" search engines. But if someone writes a crawler that doesn't check the file, or that chooses to ignore the instructions, then there is nothing to force it to do that, except an IP address ban.
You are correct; however their webmaster info page says it supports the robots.txt file. in that instance it should be possible to limit it to specific pages on your site and restrict the rest.
to do that you have to use the
useragent: twiceler
and the disallow switch for each folder in the root that you want untouched.
-
It's quite a bit faster than Google.
BTW since Google now has intimately detailed pictures, should it be called GO OGLE?
-
Sorry to be a pain
What does crawling mean?
What happenes when this website does it?
-
crawling is when a search engine like google goes to a website and indexes the site for displaying in their search.
as far as this particular search engine looks like it's hogging the network resources a bit to often. making the site slowdown or possibly timing out.
-
I see. Thanks, mate!
-
Here is what a crawl looks like from my http access log.
ipaddress - - [21/Mar/2009:13:59:48 -0500] "GET /robots.txt HTTP/1.0" 200 603 "-" "Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)"
ipaddress - - [21/Mar/2009:13:59:49 -0500] "GET /htj.html HTTP/1.0" 200 7552 "-" "Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)"
ipaddress - - [21/Mar/2009:14:26:20 -0500] "GET /robots.txt HTTP/1.1" 200 603 "-" "msnbot/1.1 (+http://search.msn.com/msnbot.htm)"
ipaddress - - [21/Mar/2009:14:30:36 -0500] "GET /robots.txt HTTP/1.1" 200 603 "-" "msnbot/1.1 (+http://search.msn.com/msnbot.htm)"
here is an entry from a web browser
ipaddress - - [22/Mar/2009:15:40:26 -0500] "GET /style.css HTTP/1.1" 200 3059 "http://farriscreeklodge.org/" "Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.0.6) Gecko/2009021619 Mandriva/1.9.0.6-0.1mdv2009.0 (2009.0) Firefox/3.0.6"
ipaddress - - [22/Mar/2009:15:40:26 -0500] "GET /images/valid-xhtml10-blue.png HTTP/1.1" 200 2026 "http://farriscreeklodge.org/" "Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.0.6) Gecko/2009021619 Mandriva/1.9.0.6-0.1mdv2009.0 (2009.0) Firefox/3.0.6"
-
that is very interesting!!!!!!!!! :o :o :o
-
Does Any one know if Google's Primary crawler is a spider or something else, Because I like to track my website's crawlings.
-
http://www.google.com/support/webmasters/?hl=en