I'm trying to figure out how domains are read by search engines.
For example,
www.ABC.com is my site
www.XYZ.com is my other site
But I only have one server.
www.ABC.com/XYZ is the actual location of XYZ.com
I used URL masking so site visitors don't know any better.
When they type in XYZ.com, all they see is XYZ.com in the address bar
So... will google index XYZ.com by attacking XYZ.com, or by using the true raw www.ABC.com/XYZ folder? Or both?
And if so, should there be multiple robots.txt files? Will that work?
Does the google bot treat the forward as the raw domain, and not a subfolder (as is it's true nature online)
?????![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
Help.
+ Reply to Thread
Results 1 to 11 of 11
-
Want my help? Ask here! (not via PM!)
FAQs: Best Blank Discs • Best TBCs • Best VCRs for capture • Restore VHS -
It'll probably crawl the site both ways.
As for your question about treating it as a raw domain you'd probably have to ask Google that. -
The first thing you must ensure is that your web server doesn't allow directory indexes, this will hide all your site directories by default, no-one should know your physical server directory structure.
Now the only way crawlers can work out your "site" structure is to trace the links on your site, starting from the site-root path.
The crawlers can only start from WWW.ABC.COM and WWW.XYZ.COM, they have no way of knowing that the root of WWW.XYZ.COM is "WWW.ABC.COM/XYZ" unless you have a link (or reference to it) somewhere on your site that reveals this.
By the way what's the default site attached to your IP address? What will the crawlers see when they come in via IP.
Be careful using "robots.txt", not all crawlers honour the file, and by inserting entries in this you are actually revealing the existence of the directories that you want to keep secret. You only need to put those directories in the robots.txt that can be reached through your public links. For example you would have many links (and references) to directory "/images/" so if you don't want your images indexed then put this in the robots.txt, but if there is no reference anywhere to directory "/XYZ/" don't expose it by putting it in the file. -
It's paid hosting. I don't control the server.
I've never referenced/linked to the ABC.COM/XYZ, but google has indexed it that way many times. I always link it as XYZ.com (if I link at all!)Want my help? Ask here! (not via PM!)
FAQs: Best Blank Discs • Best TBCs • Best VCRs for capture • Restore VHS -
OK Hosted - all is not lost yet.
How is it showing in Google?
I'm assuming that if for example you have "index.htm" in your XYZ.COM site then Google is showing it as "http://ABC.COM/XYZ/index.htm" instead of "http://XYZ.COM/index.htm" or does it show both.
If it is only the first option then your host's DNS must show the path that way rather than have a virtual site pointing directly to the XYZ directory, if it's both then index listings must be allowed. You may be able to control this if you can set directory permissions, for example if it is Apache you may be able to create a ".htaccess" file in your directories to disallow indexes.
Also if it's both then your original idea of placing a robots.txt file in the ABC.COM root directory to exclude the "XYZ" directory may do the trick. -
htaccess is not allowed,
these Windows servers not using htaccess
If I block www.ABC.com/XYZ, would it still see XYZ.com in google? That's my question. It would take 6 months to test on my own, at minimum as they give 90 days as the refresh time
Or would placing a robots.txt inside the XYZ folder be seen as the robots.txt for XYZ.com?
Google sees ABC.com, XYZ.com and ABC.com/XYZ
All 3 are found.
I also know not everybody uses robots.txt as they should but all the major ones do, google, MSN, altavista, etc. The ones I care about most.Want my help? Ask here! (not via PM!)
FAQs: Best Blank Discs • Best TBCs • Best VCRs for capture • Restore VHS -
Windows IIS requires different config - we won't go there.
The way 'robots.txt' is supposed to work is you put it in the root of your site ABC.COM and it controls everything with that site so you can exclude directory XYZ from it.
Site XYZ.COM should be seen as a different site and if google comes in via that address then the 'robots.txt' in ABC.COM should have no affect. (It belongs to a different site)
You may also find this of use -> http://member.melbpc.org.au/~tgosbell/articles/google-exclusion/ -
Be careful LS you could pu the kaibash on your main site as far as google is concerned, I'm not sure of the specifics but linking sites on the same server is bad news..... http://www.google.com/search?sourceid=navclient&ie=UTF-8&oe=UTF-8&q=google+%22same+server%22+penalty
Edit: Not good...... If I remeber correctly your site had a PR5 which is good, you now have a 2 which is not good. Most main pages are assigned a 3 right from the start. www.webmasterworld.com is a good place to find out info on PR.....
Edit 2: Your other link shows a PR4.... your putting the kaibash as I stated above on your main page.
-
Interesting. There is some space donation I do off the server, and it needs to stay hidden. I also route several domain names to the various donated folders. My site has several domains hitting folders too. Most people know www.nomorecoasters.com, which is just a forward to one of the most popular pages on the site.
I've pretty much come to the conclusion that google sucks ass as much as ebay does. They make up rules as they go along, and could care less who they screw in the process.
While my page rank would be nice, I'm more worried about the secondary domains that need to stay unlinked and unindexed/uncrawled. Archive.org is the biggest pain in the ass, requires robots.txt to block it.
I've got to work on lots of these things this month.Want my help? Ask here! (not via PM!)
FAQs: Best Blank Discs • Best TBCs • Best VCRs for capture • Restore VHS -
Originally Posted by lordsmurf
I'm even afraid to touch my coal site cause it's sitting right where I want it. -
Yeah, like I said, my google rank is small fries compared to everything else.
Want my help? Ask here! (not via PM!)
FAQs: Best Blank Discs • Best TBCs • Best VCRs for capture • Restore VHS
Similar Threads
-
Cant get rid of aliasing
By killerteengohan in forum RestorationReplies: 7Last Post: 7th Feb 2012, 23:33 -
Bad deinterlacing or aliasing ?
By ARTO65 in forum Authoring (DVD)Replies: 19Last Post: 31st Oct 2011, 12:56 -
anti-aliasing
By anthell in forum RestorationReplies: 4Last Post: 7th May 2010, 22:08 -
I'm having aliasing problems
By DougGorius in forum ffmpegX general discussionReplies: 3Last Post: 25th Jan 2009, 22:16 -
After-Effect aliasing problem
By kiwikiwi in forum EditingReplies: 9Last Post: 11th Sep 2008, 12:06