Home | Products | Consulting | Forums | Support | Order | 1-800-752-0455
  Main
Index
Search
Posts
Who's
Online
Log
In

Home: Discontinued Products: Article Manager 1:
Google News bot doesn't crawl the news page anymore?

 

First page Previous page 1 2 Next page Last page  View All


DefenceTalk.com
User

Jul 22, 2005, 7:38 AM

Post #1 of 36 (4603 views)
Shortcut
Google News bot doesn't crawl the news page anymore? Can't Post

I don't know what is wrong here but after july 19th, google has stopped crawling our news page for news items. I have no idea what changed - atleast nothing changed from our side.

http://www.defencenews.info

Has google made any changes to their code? What could be causing this?

*distressed*

Thanks!


Cliff
Staff


Jul 22, 2005, 4:46 PM

Post #2 of 36 (4592 views)
Shortcut
Re: [DefenceTalk.com] Google News bot doesn't crawl the news page anymore? [In reply to] Can't Post

Hi DefenceTalk.com,

Thanks for posting.

I've taken a look at your pages and I don't see any reason why Google would stop crawling your pages.

I'm just curious what tipped you off that Google wasn't hitting the site any more. Was it something in the log files for your site or was it something from Google that made you notice?
Regards,
Cliff Stefanuk - Customer Service Manager
support@interactivetools.com


DefenceTalk.com
User

Jul 22, 2005, 6:08 PM

Post #3 of 36 (4588 views)
Shortcut
Re: [Cliff] Google News bot doesn't crawl the news page anymore? [In reply to] Can't Post

Well, after I publish articles, I check google after 10-15 minutes. Usually, the news shows up on google after 3-4 minutes but it hasnt crawled the NEWS page since the 20th.

Also, the publishcron modification that I added to the interface templates is not updating the articles anymore and i have to run the URLs for each page in order to get them to update. Could that be the reason that the pages are staticly being update vs dynamic?


MikeB
Staff / Moderator


Jul 23, 2005, 5:06 PM

Post #4 of 36 (4571 views)
Shortcut
Re: [DefenceTalk.com] Google News bot doesn't crawl the news page anymore? [In reply to] Can't Post

Hi,

Thanks for the post! Smile

I took another look at your site and it doesn't look like any of the content on there would be preventing Google from indexing it.

While I'm not too sure of any of Google's requirements it seems that it shouldn't make a different if the page is created by publishcron in the admin program or by you as the page being created is still a static .php page on the server.

If you usually notice your content being indexed faster than this you may want to take a look at Google's search guidelines or even get in touch with them to see if they have any details about their indexing.

I hope this helps and if you have any other questions feel free to let me know! Smile

Cheers,
Mike Briggs - Product Specialist
support@interactivetools.com


Hire me!
Save time by getting our experts to help with your project. Template changes, advanced features, full integration, whatever you need. Whether you need one hour or fifty, get it done fast with Priority Consulting.


DefenceTalk.com
User

Jul 24, 2005, 6:54 PM

Post #5 of 36 (4561 views)
Shortcut
Re: [MikeB] Google News bot doesn't crawl the news page anymore? [In reply to] Can't Post

Mike, thanks.

That's what I am shocked too because google has had no problem crawling the website or the news section for so long and all of the sudden it stopped after july 19th. I have sent emails to google news team and our host is also looking into this.

I have watched visitors that come to the site today and google bot always seem to hit the .shtml pages which do not exist. Crazy

It also seems to be going after the "temp_ files rather than the files. Also, seen google hitting the

http://defencetalk.com/cgi-bin/news/exec/search.cgi?cat=17&start=14&perpage=13&template=index/default-cat-wmd.html and similar search type pages but nothing else... Unsure


(This post was edited by DefenceTalk.com on Jul 24, 2005, 8:20 PM)


ross
Staff / Moderator


Jul 26, 2005, 8:51 AM

Post #6 of 36 (4544 views)
Shortcut
Re: [DefenceTalk.com] Google News bot doesn't crawl the news page anymore? [In reply to] Can't Post

Hi there.

Thanks for keeping us up to date!

I know that Google likes to up date it’s indexing algorithms on a fairly regular basis so the way your pages get indexed will definitely change from time to time. I noticed that you have the same meta keywords and descriptions on each page. Do you think changing them slightly might get Google to realize that a change has been made and that you have new material to index?

With your publishcron, had you used Theo’s workaround for getting it to run automatically?

http://www.interactivetools.com/forum/gforum.cgi?post=13669;search_string=publishcron;t=search_engine#13669

If there was anyone else in the community that has more experience working with Google indexing, could you let us know? We would really apprectiate the help.

Keep me up to date Defencetalk Smile.
-----------------------------------------------------------
Cheers,
Ross Fairbairn - Product Specialist
support@interactivetools.com


Hire me!
Save time by getting our experts to help with your project. Template changes, advanced features, full integration, whatever you need. Whether you need one hour or fifty, get it done fast with Priority Consulting.




DefenceTalk.com
User

Jul 26, 2005, 9:44 AM

Post #7 of 36 (4541 views)
Shortcut
Re: [ross] Google News bot doesn't crawl the news page anymore? [In reply to] Can't Post

Thanks Ross.

I did get a reply back from google news team:

excerpts:


Quote


We cannot include your site in Google News at this time because your articles are set up as posts or threads.

1. Set up your site in standard HTML format.

2. Each page that displays an article's full text needs a unique, static URL. We cannot include sites in Google News that display multiple articles from the same URL.

3. The URL for each article must contain a unique number consisting of at least three digits.

For example, our news crawler would not crawl these URLs: http://www.google.com/lemurs_in_the_mist.html
http://www.google.com/news/article23.html

It would crawl these pages: http://www.google.com/news/08112003/article.html
http://www.google.com/news/lemurs_in_the_mist/23467.html

4. Don't include a date in the URL. These URLs often change regularly. Since we're unable to detect the most current URL, we can't crawl the site for new content.

5. Create an HTML, text-based link structure. We are unable to crawl links pointing from graphics or embedded in JavaScript.



Reply that I sent:

Quote
Hello Google News Team, I beg your pardon, but our news section is not posts and threads. Please check again... it was being crawled just fine before the 20th of july. The articles are in this format: http://www.defencetalk.com/news/publish/article_002718.php the main page is: http://www.defencetalk.com/news/publish/index.php Our threads and posts aka forums are located here: http://www.defencetalk.com/forums Please see here for previously crawled news articles on 19th and before: http://news.google.com/news?q=defencetalk&hl=en&lr=&rls=GGLG,GGLG:2005-20,GGLG:en&tab=wn&ie=UTF-8&scoring=d Please reconsider. Thank You,



ross
Staff / Moderator


Jul 27, 2005, 8:42 AM

Post #8 of 36 (4517 views)
Shortcut
Re: [DefenceTalk.com] Google News bot doesn't crawl the news page anymore? [In reply to] Can't Post

Hi Defencetalk.

Thanks for keeping us up to date!

It seems as though Google may have mis interpreted your original email so hopefully they are able to have a better look this time.

If you could post up their second response for all of us, that would be great Smile.

I look forward to hearing from you.
-----------------------------------------------------------
Cheers,
Ross Fairbairn - Product Specialist
support@interactivetools.com


Hire me!
Save time by getting our experts to help with your project. Template changes, advanced features, full integration, whatever you need. Whether you need one hour or fifty, get it done fast with Priority Consulting.




fmg
User

Aug 14, 2005, 12:42 AM

Post #9 of 36 (4386 views)
Shortcut
Re: [DefenceTalk.com] Google News bot doesn't crawl the news page anymore? [In reply to] Can't Post

google is very strict with the way you implement redirects...


carminejg3
User

Nov 19, 2005, 12:34 PM

Post #10 of 36 (3670 views)
Shortcut
Re: [fmg] Google News bot doesn't crawl the news page anymore? [In reply to] Can't Post

hmmm. I noticed our articles would appear in google news, but have been dropped. I never thought of it being the url name.



so this is saying they want unque numbers? thats pointless. we have unquie article names. thats good enough for me.


Webmaster Wink
http://news.carjunky.com


DefenceTalk.com
User

Nov 21, 2005, 8:23 AM

Post #11 of 36 (3641 views)
Shortcut
Re: [carminejg3] Google News bot doesn't crawl the news page anymore? [In reply to] Can't Post

With all due respect to google news/google....

Google's way of determining "original content" is the reason our site was dropped... it is totally absurd, stupid and an asinine way of figuring out which site has "original" content and which doesn't.

I pointed out to google few times that they reason they were dropping and not adding our site is the same reason why they are still crawling news from 100s and 1000s of websites and only response I got from them was "we will look into this.... thank you."

Common sense is something you may not want to try when dealing with google news folks.

Anyhow, our visitors and page views have increased and are increasing every month through RSS and other news crawlers, so google news can stick it where sun doesn't shine.


carminejg3
User

Nov 21, 2005, 8:41 AM

Post #12 of 36 (3638 views)
Shortcut
Re: [DefenceTalk.com] Google News bot doesn't crawl the news page anymore? [In reply to] Can't Post

lol i feel your pain. google is wearing a lot of hats right now, and the scary part is that they hold the cards with 51% share of the search market....

The big plus to the rss feeds is that you have the adsense scappers (simple sites looking to get clicks) that will take your rss feed and use your artiles to pull in traffic, at least this will increase your links.

THE BONUS TO GOOGLE....

Yahoo, MSN and the rest are starting to not be as hard to work with, we where dropped from yahoo search... and recently we have started to gain a foot back. I think its the google factor.

As with the 3 numbers in the title is just dumb, most papers have humans inserting the articles whom can customize the name so why in the world would they insert numbers. /artilces/1234.html when they could do /articles/car_insurance_facts.html

1234.html good for google the second is good for the rest.

I'm putting some bets that MSN, and Yahoo are working hard and fast to gain some type of foot back into the search market. So I'm staying with my custom names.


Webmaster Wink
http://news.carjunky.com


carminejg3
User

Nov 21, 2005, 8:54 AM

Post #13 of 36 (3637 views)
Shortcut
Re: [carminejg3] Google News bot doesn't crawl the news page anymore? [In reply to] Can't Post

I emailed google asking to see what excatly they are looking for.

I'll keep everyone posted. I think they want a url like so.

/2005/11/xm-and-sirius-satellite-radio-giants-go-head-to-head/

but we'll see.


Webmaster Wink
http://news.carjunky.com


DefenceTalk.com
User

Nov 21, 2005, 8:58 AM

Post #14 of 36 (3636 views)
Shortcut
Re: [carminejg3] Google News bot doesn't crawl the news page anymore? [In reply to] Can't Post

We are doing fine as far as search crawlers go (yahoo, msn, google and all others).

The news which isn't crawled immediately, shows up in the search after 2-3 days, which isn't bad.


carminejg3
User

Nov 21, 2005, 9:22 AM

Post #15 of 36 (3634 views)
Shortcut
Re: [DefenceTalk.com] Google News bot doesn't crawl the news page anymore? [In reply to] Can't Post

so it just doesn't show up in google news?



also how do you name your articles?


Webmaster Wink
http://news.carjunky.com


DefenceTalk.com
User

Nov 21, 2005, 9:34 AM

Post #16 of 36 (3632 views)
Shortcut
Re: [carminejg3] Google News bot doesn't crawl the news page anymore? [In reply to] Can't Post

No, it doesn't show up in google news but instead shows up on google search after 2-3 days.

name for articles: article_002222.php


shalliday
User

Jan 16, 2006, 9:13 AM

Post #17 of 36 (3348 views)
Shortcut
Re: [DefenceTalk.com] Google News bot doesn't crawl the news page anymore? [In reply to] Can't Post

Any suggestion on how we could best implement this with Article Manager. I am very interested in getting our site news information into Google News but not sure how best to proceed setting it up.

I am also wondering if I changed the publish directory name for a given category from say "artman" to "news", would Article Manager leave or delete the old directory and articles the next time "Publish All" was seledcted?

Scott


Quote
We cannot include your site in Google News at this time because your articles are set up as posts or threads.

1. Set up your site in standard HTML format.

2. Each page that displays an article's full text needs a unique, static URL. We cannot include sites in Google News that display multiple articles from the same URL.

3. The URL for each article must contain a unique number consisting of at least three digits.

For example, our news crawler would not crawl these URLs: http://www.google.com/lemurs_in_the_mist.html
http://www.google.com/news/article23.html

It would crawl these pages: http://www.google.com/news/08112003/article.html
http://www.google.com/news/lemurs_in_the_mist/23467.html

4. Don't include a date in the URL. These URLs often change regularly. Since we're unable to detect the most current URL, we can't crawl the site for new content.

5. Create an HTML, text-based link structure. We are unable to crawl links pointing from graphics or embedded in JavaScript.



(This post was edited by shalliday on Jan 16, 2006, 9:16 AM)


Cliff
Staff


Jan 16, 2006, 3:18 PM

Post #18 of 36 (3342 views)
Shortcut
Re: [shalliday] Google News bot doesn't crawl the news page anymore? [In reply to] Can't Post

Hi shalliday,

Thanks for posting.

Google's requirements are pretty simple -- they just want to change how the numbering is.

Article Manager, by default, will post an article in the following format:

article_##.shtml

So, you'll just have to make sure that there are always three numbers there. The difficulty comes with Article Manager giving filenames with less than three numbers. So, my recommendation would be to simply tell it to add in some extra numbers.

To do this, go into the Setup Options and choose Server. Then scroll down to Publish File Settings. Find "Article Pages" where the default is set to article_

Change this to article_00

That way, instead of article_1.shtml, it will create article_001.shtml. This will meet Google's requirements, and it's pretty easy to get going.

Changing the publish directory will make Article Manager publish to a new location, so any files in the old directory will reside there until you remove them. There won't be any link from Article Manager to them any more though, so in effect they would be 'orphaned'.

I hope that helps!
Regards,
Cliff Stefanuk - Customer Service Manager
support@interactivetools.com


shalliday
User

Jan 17, 2006, 1:44 PM

Post #19 of 36 (3312 views)
Shortcut
Re: [Cliff] Google News bot doesn't crawl the news page anymore? [In reply to] Can't Post

Thank you Cliff

I have started making the changes you outlined however many of the articles published have file names specified in the Filename field on the article page. Is there an easy way to wipe out the individual file names assigned to each article so they go back to the default name article manager generates or do I have to edit each individual article to remove them?

Also should the "Publish URL" field value change to equal the "Publish Dir" value?

Looking forward to your reply.

Thank you

Scott


(This post was edited by shalliday on Jan 17, 2006, 1:48 PM)


Ginslinger
User

Jan 18, 2006, 12:03 PM

Post #20 of 36 (3300 views)
Shortcut
Re: [Cliff] Google News bot doesn't crawl the news page anymore? [In reply to] Can't Post

Hi Cliff, tried your solution but come up with file names like this article_00%2088.shtml

Where am I picking up the %


Cliff
Staff


Jan 18, 2006, 4:03 PM

Post #21 of 36 (3289 views)
Shortcut
Re: [Ginslinger] Google News bot doesn't crawl the news page anymore? [In reply to] Can't Post

Hi Ginslinger,

Thanks for posting,

Looks to me like there may be a space at the end of your article name, so I'd double check that to make sure that there isn't.

I suspect that should fix that up for you, but let me know if you are still having troubles Smile
Regards,
Cliff Stefanuk - Customer Service Manager
support@interactivetools.com


Ginslinger
User

Jan 18, 2006, 10:10 PM

Post #22 of 36 (3273 views)
Shortcut
Re: [Cliff] Google News bot doesn't crawl the news page anymore? [In reply to] Can't Post

Hi Cliff, thanks for the reply. Actually the space must have been at the end of where I put _00 as I just put the cursor at the end of it and hit delete and all is fine.

Thanks.


shalliday
User

Jan 18, 2006, 10:53 PM

Post #23 of 36 (3272 views)
Shortcut
Re: [shalliday] Google News bot doesn't crawl the news page anymore? [In reply to] Can't Post


In Reply To
Thank you Cliff

I have started making the changes you outlined however many of the articles published have file names specified in the Filename field on the article page. Is there an easy way to wipe out the individual file names assigned to each article so they go back to the default name article manager generates or do I have to edit each individual article to remove them?

Also should the "Publish URL" field value change to equal the "Publish Dir" value?

Looking forward to your reply.

Thank you

Scott


Hi Cliff, wondering if you could help with the questions I posted. Just wanting for a reply so I can proceed.

Thanks

Scott


ross
Staff / Moderator


Jan 19, 2006, 11:12 AM

Post #24 of 36 (3246 views)
Shortcut
Re: [shalliday] Google News bot doesn't crawl the news page anymore? [In reply to] Can't Post

Hi Scott.

Thanks for posting!

When you uncheck the filename box in Setup Options and hit Publish All, Article Manager should actually rename all your articles for you.

The publish URL and publish DIR are similar but will probably not be the same thing. The publish DIR is the path on your server that points to the publish folder. The publish URL is what you type into a web browser to see the publish folder.

Does that make sense? Let me know if you need anymore details Smile.
-----------------------------------------------------------
Cheers,
Ross Fairbairn - Product Specialist
support@interactivetools.com


Hire me!
Save time by getting our experts to help with your project. Template changes, advanced features, full integration, whatever you need. Whether you need one hour or fifty, get it done fast with Priority Consulting.




shalliday
User

Jan 19, 2006, 12:57 PM

Post #25 of 36 (3243 views)
Shortcut
Re: [ross] Google News bot doesn't crawl the news page anymore? [In reply to] Can't Post

Thank you very much. Worked like a charm Cool A little more work to clean up and I'll be ready to tackle the googlesitemap feed. Yahoo! Thanks again for your help.

First page Previous page 1 2 Next page Last page  View All
 
 


Search for (options)
Products
CMS Builder
Article Manager
Realty Manager
Listings Manager
Order Now
Services
Priority Consulting
Support
Online Documentation
Support Forums
Support Homepage
Company Info
12 reasons to choose us!
Meet the team
Monthly newsletter
Contact Us
Toll Free: 1-800-752-0455
Phone: (604) 689-3347
Sales | Support
Conditions of Use | Privacy Policy | Copyright © interactivetools.com 2008
#201 - 2730 Commercial Drive, Vancouver BC Canada V5N 5P4