Recently we have begun to experience an increase in spam generated from some of our web (HTML) forms. How do you deal with this? I'm concerned that some solutions may be inaccessible (I.e. CAPTCHA).
We are experimenting with the recaptcha API for some of our forms. The audio feature of it was quite appealing. It may not be a system wide solution, but to us it was worth a look. http://recaptcha.net/learnmore.html
Another interesting method I have seen is to add
enctype="multipart/form-data"
to your form and check for that when processing. I guess most bots don't know to change their encode type. Granted this is another short fix till they decide to do that.
The other ones I know of are mostly Javascript based which I won't use on the off chance they don't have it enabled.
I recently tried a concept that seems to work well. It's based on the fact that most bots only visit a form page once, so that it can harvest the form and all its field names, action URL, etc. This info about the form is then stored in their database somehow, and is then used to constantly hammer the action URL by sending values for all the fields.
(Before continuing, let me say that I took a pretty detailed look at my server logs, found IP addresses of bots sending form spam, and discovered that those IP addresses were not actually visiting the form page itself. They were only submitting to the action URL.)
So, knowing this, I created a method that logs the IP address of every visit to a form. So when you visit a web page with a form on it, your IP address is logged, and you now have permission to actually submit the form. Then I changed my form handling program to check the incoming IP address against that log, to make sure that it has permission to submit the form.
In other words, you're not allowed to submit a form unless you actually visit the form web page first. Most bots do not visit the form web page first, and therefore most all form submissions are now coming from humans.
The main reason I like this solution is that it's all server-based. No javascript. No CAPTCHA. Nothing extra for users to do. And I got fantastic response from the form submission recipients. Form spam dropped to virtually zero!
Then came a problem. A number of visitors started reporting that they could not submit our forms. After investigating IP addresses, I found that they are all AOL users. It seems that AOL is doing some stuff with their IP address allocation, so that a user's IP address can actually change during their online session. This of course throws my whole theory out the window.
So, I currently have this whole thing turned off, and I'm hoping to figure out a way to still use it. I figured I'd post it here anyway, with the hope that maybe someone else can expand on it with a good idea.
AOL uses proxy servers that make many users look like one IP address and, as you discovered, makes the same user look like different IP addresses sometimes.
You can check the referrer in the script for the action url to verify it came from the form page and achieve about the same amount of security without all the extra work; and avoid the AOL problem. Referrers can be spoofed, but so can IPs.
Most the bots I have dealt with are smart enough to spoof the referrer so that method hasn't been that fruitful for me.
Though Scott I think a better was to handle your situation is instead of putting an IP address in your DB, make a random seed and put that in the DB and place it in a hidden field in the form. After the person submits the form delete it from the database (to remove the chance you later generate the same seed and the person can't submit) and now when the bots attempt to keep doing it they won't be able to.
I think that method would fix the issues had with the previous solution.
Yeah, when I did my initial look into the form spam in my server logs, I found a lot of spoofed HTTP_REFERERs. Their initial harvesting process just logged the URL of the original form and simply sent that with the form submissions. I know that IP addresses can be spoofed too, but after banging heads with a couple of other IT minds, we figured that using IP address is more reliable (less spoofed).
I also know about AOL's caching system (as well as the ability for any ISP to do the same). But I only thought that the risk would be multiple visitors with the same IP address. What I didn't realize is that an AOL user doesn't use the same cache server throughout the life of a single online session, hence the possibility of changing IP addresses.
I still like my original concept of insuring that a user actually visits the form first before submitting. I like a couple other ideas posted here too.
I avoid US based servers for personal information, and try to keep all PII on-campus or at least on Canadian servers. The requesting IP number in GA data could be personally identifying in some cases, but I'm not very concerned.
I work at a small c…
I’m involved in a project in which I’m researching how IT organizations practice the tracking of people’s work time; what are some best practices and how do organizations establish guidelines for tracking time. With the right approach, there can be…
Is anyone out there using Luminis for your school portal? If so, do you have a portal administrator, portal manager, etc. position as a point of contact (e.g., creating channels, providing training and support for the different applications, serving…
Oh oh... it would be very helpful to know what % of mobile phones are Web accessible.. with a data plan. Based on that information, marketing to/with the device would change considerably.
A place to discuss marketing tactics for admissions or development on the web in higher education, whether it be features on your website, blogs, boards, chat, email marketing, social networks, etc...
annual conference for higher education professionals who are involved in any step of the strategy, design or development of an institution's/department's website