 |

DefenceTalk.com
User
Jul 22, 2005, 7:38 AM
Post #1 of 36
(4603 views)
Shortcut
|
|
Google News bot doesn't crawl the news page anymore?
|
Can't Post
|
|
I don't know what is wrong here but after july 19th, google has stopped crawling our news page for news items. I have no idea what changed - atleast nothing changed from our side. http://www.defencenews.info Has google made any changes to their code? What could be causing this? *distressed* Thanks!
|
|
|  |
 |

Cliff
Staff

Jul 22, 2005, 4:46 PM
Post #2 of 36
(4592 views)
Shortcut
|
|
Re: [DefenceTalk.com] Google News bot doesn't crawl the news page anymore?
[In reply to]
|
Can't Post
|
|
Hi DefenceTalk.com, Thanks for posting. I've taken a look at your pages and I don't see any reason why Google would stop crawling your pages. I'm just curious what tipped you off that Google wasn't hitting the site any more. Was it something in the log files for your site or was it something from Google that made you notice? Regards, Cliff Stefanuk - Customer Service Manager support@interactivetools.com
|
|
|  |
 |

DefenceTalk.com
User
Jul 22, 2005, 6:08 PM
Post #3 of 36
(4588 views)
Shortcut
|
|
Re: [Cliff] Google News bot doesn't crawl the news page anymore?
[In reply to]
|
Can't Post
|
|
Well, after I publish articles, I check google after 10-15 minutes. Usually, the news shows up on google after 3-4 minutes but it hasnt crawled the NEWS page since the 20th. Also, the publishcron modification that I added to the interface templates is not updating the articles anymore and i have to run the URLs for each page in order to get them to update. Could that be the reason that the pages are staticly being update vs dynamic?
|
|
|  |
 |

MikeB
Staff
/ Moderator

Jul 23, 2005, 5:06 PM
Post #4 of 36
(4571 views)
Shortcut
|
|
Re: [DefenceTalk.com] Google News bot doesn't crawl the news page anymore?
[In reply to]
|
Can't Post
|
|
Hi, Thanks for the post! I took another look at your site and it doesn't look like any of the content on there would be preventing Google from indexing it. While I'm not too sure of any of Google's requirements it seems that it shouldn't make a different if the page is created by publishcron in the admin program or by you as the page being created is still a static .php page on the server. If you usually notice your content being indexed faster than this you may want to take a look at Google's search guidelines or even get in touch with them to see if they have any details about their indexing. I hope this helps and if you have any other questions feel free to let me know! Cheers, Mike Briggs - Product Specialist support@interactivetools.com
Hire me! Save time by getting our experts to help with your project. Template changes, advanced features, full integration, whatever you need. Whether you need one hour or fifty, get it done fast with Priority Consulting.
|
|
|  |
 |

DefenceTalk.com
User
Jul 24, 2005, 6:54 PM
Post #5 of 36
(4561 views)
Shortcut
|
|
Re: [MikeB] Google News bot doesn't crawl the news page anymore?
[In reply to]
|
Can't Post
|
|
Mike, thanks. That's what I am shocked too because google has had no problem crawling the website or the news section for so long and all of the sudden it stopped after july 19th. I have sent emails to google news team and our host is also looking into this. I have watched visitors that come to the site today and google bot always seem to hit the .shtml pages which do not exist. It also seems to be going after the "temp_ files rather than the files. Also, seen google hitting the http://defencetalk.com/cgi-bin/news/exec/search.cgi?cat=17&start=14&perpage=13&template=index/default-cat-wmd.html and similar search type pages but nothing else...
(This post was edited by DefenceTalk.com on Jul 24, 2005, 8:20 PM)
|
|
|  |
 |

ross
Staff
/ Moderator

Jul 26, 2005, 8:51 AM
Post #6 of 36
(4544 views)
Shortcut
|
|
Re: [DefenceTalk.com] Google News bot doesn't crawl the news page anymore?
[In reply to]
|
Can't Post
|
|
Hi there. Thanks for keeping us up to date! I know that Google likes to up date it’s indexing algorithms on a fairly regular basis so the way your pages get indexed will definitely change from time to time. I noticed that you have the same meta keywords and descriptions on each page. Do you think changing them slightly might get Google to realize that a change has been made and that you have new material to index? With your publishcron, had you used Theo’s workaround for getting it to run automatically? http://www.interactivetools.com/forum/gforum.cgi?post=13669;search_string=publishcron;t=search_engine#13669 If there was anyone else in the community that has more experience working with Google indexing, could you let us know? We would really apprectiate the help. Keep me up to date Defencetalk . ----------------------------------------------------------- Cheers, Ross Fairbairn - Product Specialist support@interactivetools.com
Hire me! Save time by getting our experts to help with your project. Template changes, advanced features, full integration, whatever you need. Whether you need one hour or fifty, get it done fast with Priority Consulting.
|
|
|  |
 |

DefenceTalk.com
User
Jul 26, 2005, 9:44 AM
Post #7 of 36
(4541 views)
Shortcut
|
|
Re: [ross] Google News bot doesn't crawl the news page anymore?
[In reply to]
|
Can't Post
|
|
Thanks Ross. I did get a reply back from google news team: excerpts: We cannot include your site in Google News at this time because your articles are set up as posts or threads. 1. Set up your site in standard HTML format. 2. Each page that displays an article's full text needs a unique, static URL. We cannot include sites in Google News that display multiple articles from the same URL. 3. The URL for each article must contain a unique number consisting of at least three digits. For example, our news crawler would not crawl these URLs: http://www.google.com/lemurs_in_the_mist.html http://www.google.com/news/article23.html It would crawl these pages: http://www.google.com/news/08112003/article.html http://www.google.com/news/lemurs_in_the_mist/23467.html 4. Don't include a date in the URL. These URLs often change regularly. Since we're unable to detect the most current URL, we can't crawl the site for new content. 5. Create an HTML, text-based link structure. We are unable to crawl links pointing from graphics or embedded in JavaScript. Reply that I sent:
|
|
|  |
 |

ross
Staff
/ Moderator

Jul 27, 2005, 8:42 AM
Post #8 of 36
(4517 views)
Shortcut
|
|
Re: [DefenceTalk.com] Google News bot doesn't crawl the news page anymore?
[In reply to]
|
Can't Post
|
|
Hi Defencetalk. Thanks for keeping us up to date! It seems as though Google may have mis interpreted your original email so hopefully they are able to have a better look this time. If you could post up their second response for all of us, that would be great . I look forward to hearing from you. ----------------------------------------------------------- Cheers, Ross Fairbairn - Product Specialist support@interactivetools.com
Hire me! Save time by getting our experts to help with your project. Template changes, advanced features, full integration, whatever you need. Whether you need one hour or fifty, get it done fast with Priority Consulting.
|
|
|  |
 |

fmg
User
Aug 14, 2005, 12:42 AM
Post #9 of 36
(4386 views)
Shortcut
|
|
Re: [DefenceTalk.com] Google News bot doesn't crawl the news page anymore?
[In reply to]
|
Can't Post
|
|
google is very strict with the way you implement redirects...
|
|
|  |
 |

carminejg3
User
Nov 19, 2005, 12:34 PM
Post #10 of 36
(3670 views)
Shortcut
|
|
Re: [fmg] Google News bot doesn't crawl the news page anymore?
[In reply to]
|
Can't Post
|
|
hmmm. I noticed our articles would appear in google news, but have been dropped. I never thought of it being the url name. so this is saying they want unque numbers? thats pointless. we have unquie article names. thats good enough for me. Webmaster http://news.carjunky.com
|
|
|  |
 |

DefenceTalk.com
User
Nov 21, 2005, 8:23 AM
Post #11 of 36
(3641 views)
Shortcut
|
|
Re: [carminejg3] Google News bot doesn't crawl the news page anymore?
[In reply to]
|
Can't Post
|
|
With all due respect to google news/google.... Google's way of determining "original content" is the reason our site was dropped... it is totally absurd, stupid and an asinine way of figuring out which site has "original" content and which doesn't. I pointed out to google few times that they reason they were dropping and not adding our site is the same reason why they are still crawling news from 100s and 1000s of websites and only response I got from them was "we will look into this.... thank you." Common sense is something you may not want to try when dealing with google news folks. Anyhow, our visitors and page views have increased and are increasing every month through RSS and other news crawlers, so google news can stick it where sun doesn't shine.
|
|
|  |
 |

carminejg3
User
Nov 21, 2005, 8:41 AM
Post #12 of 36
(3638 views)
Shortcut
|
|
Re: [DefenceTalk.com] Google News bot doesn't crawl the news page anymore?
[In reply to]
|
Can't Post
|
|
lol i feel your pain. google is wearing a lot of hats right now, and the scary part is that they hold the cards with 51% share of the search market.... The big plus to the rss feeds is that you have the adsense scappers (simple sites looking to get clicks) that will take your rss feed and use your artiles to pull in traffic, at least this will increase your links. THE BONUS TO GOOGLE.... Yahoo, MSN and the rest are starting to not be as hard to work with, we where dropped from yahoo search... and recently we have started to gain a foot back. I think its the google factor. As with the 3 numbers in the title is just dumb, most papers have humans inserting the articles whom can customize the name so why in the world would they insert numbers. /artilces/1234.html when they could do /articles/car_insurance_facts.html 1234.html good for google the second is good for the rest. I'm putting some bets that MSN, and Yahoo are working hard and fast to gain some type of foot back into the search market. So I'm staying with my custom names. Webmaster http://news.carjunky.com
|
|
|  |
 | |  |
 |

DefenceTalk.com
User
Nov 21, 2005, 8:58 AM
Post #14 of 36
(3636 views)
Shortcut
|
|
Re: [carminejg3] Google News bot doesn't crawl the news page anymore?
[In reply to]
|
Can't Post
|
|
We are doing fine as far as search crawlers go (yahoo, msn, google and all others). The news which isn't crawled immediately, shows up in the search after 2-3 days, which isn't bad.
|
|
|  |
 |

carminejg3
User
Nov 21, 2005, 9:22 AM
Post #15 of 36
(3634 views)
Shortcut
|
|
Re: [DefenceTalk.com] Google News bot doesn't crawl the news page anymore?
[In reply to]
|
Can't Post
|
|
so it just doesn't show up in google news? also how do you name your articles? Webmaster http://news.carjunky.com
|
|
|  |
 |

DefenceTalk.com
User
Nov 21, 2005, 9:34 AM
Post #16 of 36
(3632 views)
Shortcut
|
|
Re: [carminejg3] Google News bot doesn't crawl the news page anymore?
[In reply to]
|
Can't Post
|
|
No, it doesn't show up in google news but instead shows up on google search after 2-3 days. name for articles: article_002222.php
|
|
|  |
 |

shalliday
User
Jan 16, 2006, 9:13 AM
Post #17 of 36
(3348 views)
Shortcut
|
|
Re: [DefenceTalk.com] Google News bot doesn't crawl the news page anymore?
[In reply to]
|
Can't Post
|
|
Any suggestion on how we could best implement this with Article Manager. I am very interested in getting our site news information into Google News but not sure how best to proceed setting it up. I am also wondering if I changed the publish directory name for a given category from say "artman" to "news", would Article Manager leave or delete the old directory and articles the next time "Publish All" was seledcted? Scott
We cannot include your site in Google News at this time because your articles are set up as posts or threads. 1. Set up your site in standard HTML format. 2. Each page that displays an article's full text needs a unique, static URL. We cannot include sites in Google News that display multiple articles from the same URL. 3. The URL for each article must contain a unique number consisting of at least three digits. For example, our news crawler would not crawl these URLs: http://www.google.com/lemurs_in_the_mist.html http://www.google.com/news/article23.html It would crawl these pages: http://www.google.com/news/08112003/article.html http://www.google.com/news/lemurs_in_the_mist/23467.html 4. Don't include a date in the URL. These URLs often change regularly. Since we're unable to detect the most current URL, we can't crawl the site for new content. 5. Create an HTML, text-based link structure. We are unable to crawl links pointing from graphics or embedded in JavaScript.
(This post was edited by shalliday on Jan 16, 2006, 9:16 AM)
|
|
|  |
 |

Cliff
Staff

Jan 16, 2006, 3:18 PM
Post #18 of 36
(3342 views)
Shortcut
|
|
Re: [shalliday] Google News bot doesn't crawl the news page anymore?
[In reply to]
|
Can't Post
|
|
Hi shalliday, Thanks for posting. Google's requirements are pretty simple -- they just want to change how the numbering is. Article Manager, by default, will post an article in the following format: article_##.shtml So, you'll just have to make sure that there are always three numbers there. The difficulty comes with Article Manager giving filenames with less than three numbers. So, my recommendation would be to simply tell it to add in some extra numbers. To do this, go into the Setup Options and choose Server. Then scroll down to Publish File Settings. Find "Article Pages" where the default is set to article_ Change this to article_00 That way, instead of article_1.shtml, it will create article_001.shtml. This will meet Google's requirements, and it's pretty easy to get going. Changing the publish directory will make Article Manager publish to a new location, so any files in the old directory will reside there until you remove them. There won't be any link from Article Manager to them any more though, so in effect they would be 'orphaned'. I hope that helps! Regards, Cliff Stefanuk - Customer Service Manager support@interactivetools.com
|
|
|  |
 |

shalliday
User
Jan 17, 2006, 1:44 PM
Post #19 of 36
(3312 views)
Shortcut
|
|
Re: [Cliff] Google News bot doesn't crawl the news page anymore?
[In reply to]
|
Can't Post
|
|
Thank you Cliff I have started making the changes you outlined however many of the articles published have file names specified in the Filename field on the article page. Is there an easy way to wipe out the individual file names assigned to each article so they go back to the default name article manager generates or do I have to edit each individual article to remove them? Also should the "Publish URL" field value change to equal the "Publish Dir" value? Looking forward to your reply. Thank you Scott
(This post was edited by shalliday on Jan 17, 2006, 1:48 PM)
|
|
|  |
 |

Ginslinger
User
Jan 18, 2006, 12:03 PM
Post #20 of 36
(3300 views)
Shortcut
|
|
Re: [Cliff] Google News bot doesn't crawl the news page anymore?
[In reply to]
|
Can't Post
|
|
Hi Cliff, tried your solution but come up with file names like this article_00%2088.shtml Where am I picking up the %
|
|
|  |
 |

Cliff
Staff

Jan 18, 2006, 4:03 PM
Post #21 of 36
(3289 views)
Shortcut
|
|
Re: [Ginslinger] Google News bot doesn't crawl the news page anymore?
[In reply to]
|
Can't Post
|
|
Hi Ginslinger, Thanks for posting, Looks to me like there may be a space at the end of your article name, so I'd double check that to make sure that there isn't. I suspect that should fix that up for you, but let me know if you are still having troubles Regards, Cliff Stefanuk - Customer Service Manager support@interactivetools.com
|
|
|  |
 |

Ginslinger
User
Jan 18, 2006, 10:10 PM
Post #22 of 36
(3273 views)
Shortcut
|
|
Re: [Cliff] Google News bot doesn't crawl the news page anymore?
[In reply to]
|
Can't Post
|
|
Hi Cliff, thanks for the reply. Actually the space must have been at the end of where I put _00 as I just put the cursor at the end of it and hit delete and all is fine. Thanks.
|
|
|  |
 |

ross
Staff
/ Moderator

Jan 19, 2006, 11:12 AM
Post #24 of 36
(3246 views)
Shortcut
|
|
Re: [shalliday] Google News bot doesn't crawl the news page anymore?
[In reply to]
|
Can't Post
|
|
Hi Scott. Thanks for posting! When you uncheck the filename box in Setup Options and hit Publish All, Article Manager should actually rename all your articles for you. The publish URL and publish DIR are similar but will probably not be the same thing. The publish DIR is the path on your server that points to the publish folder. The publish URL is what you type into a web browser to see the publish folder. Does that make sense? Let me know if you need anymore details . ----------------------------------------------------------- Cheers, Ross Fairbairn - Product Specialist support@interactivetools.com
Hire me! Save time by getting our experts to help with your project. Template changes, advanced features, full integration, whatever you need. Whether you need one hour or fifty, get it done fast with Priority Consulting.
|
|
|  |
 |

shalliday
User
Jan 19, 2006, 12:57 PM
Post #25 of 36
(3243 views)
Shortcut
|
|
Re: [ross] Google News bot doesn't crawl the news page anymore?
[In reply to]
|
Can't Post
|
|
Thank you very much. Worked like a charm A little more work to clean up and I'll be ready to tackle the googlesitemap feed. Yahoo! Thanks again for your help.
|
|
|  |
|