Add in initial README and license.
BSD licensed for now. We'll see where this takes us first.
This commit is contained in:
commit
48e7ec74c5
23
LICENSE
Normal file
23
LICENSE
Normal file
@ -0,0 +1,23 @@
|
|||||||
|
Copyright (c) 2018, Stuart Longland
|
||||||
|
All rights reserved.
|
||||||
|
|
||||||
|
Redistribution and use in source and binary forms, with or without
|
||||||
|
modification, are permitted provided that the following conditions are met:
|
||||||
|
|
||||||
|
1. Redistributions of source code must retain the above copyright notice,
|
||||||
|
this list of conditions and the following disclaimer.
|
||||||
|
2. Redistributions in binary form must reproduce the above copyright
|
||||||
|
notice, this list of conditions and the following disclaimer in the
|
||||||
|
documentation and/or other materials provided with the distribution.
|
||||||
|
|
||||||
|
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
|
||||||
|
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
|
||||||
|
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
|
||||||
|
ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
|
||||||
|
LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
|
||||||
|
CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
|
||||||
|
SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
|
||||||
|
INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
|
||||||
|
CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
|
||||||
|
ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
|
||||||
|
POSSIBILITY OF SUCH DAMAGE.
|
113
README.md
Normal file
113
README.md
Normal file
@ -0,0 +1,113 @@
|
|||||||
|
Hackaday.io Spam Hunter Project
|
||||||
|
===============================
|
||||||
|
|
||||||
|
The aim of this project is to produce tools that aid in the detection of
|
||||||
|
spambot user accounts, intended to do little more than spruik some business.
|
||||||
|
|
||||||
|
Most of these accounts share common traits that are fairly rudimentary:
|
||||||
|
|
||||||
|
- They may feature an avatar with the logo of the company being advertised,
|
||||||
|
lots of flat areas of colour, etc.
|
||||||
|
- They almost certainly give a web address of the business concerned, sometimes
|
||||||
|
a phone number or physical address. Few *real* users do the latter two.
|
||||||
|
- They often have *followed* a good dozen or more projects in the few minutes
|
||||||
|
they have been registered.
|
||||||
|
- If they publish projects or pages; this content shares the same traits and
|
||||||
|
is often posted much faster than the typical human would be able to type.
|
||||||
|
|
||||||
|
How this will work
|
||||||
|
==================
|
||||||
|
|
||||||
|
We begin by looking at the full list of users which can be retrieved via the
|
||||||
|
[users API endpoint](https://dev.hackaday.io/doc/api/get-users). For the sorts of users we want to target, it looks something like this:
|
||||||
|
|
||||||
|
```
|
||||||
|
{
|
||||||
|
"about_me": "<a target=\"_blank\" rel=\"noopener noreferrer\" href=\"http://example.com\">example.com</a>",
|
||||||
|
"created": 1515198877,
|
||||||
|
"followers": 1,
|
||||||
|
"following": 1,
|
||||||
|
"id": 123456789,
|
||||||
|
"image_url": "https://cdn.hackaday.io/images/default-avatar.png",
|
||||||
|
"location": "",
|
||||||
|
"projects": 0,
|
||||||
|
"rank": 1000000,
|
||||||
|
"screen_name": "aspamuser",
|
||||||
|
"skulls": 0,
|
||||||
|
"tags": null,
|
||||||
|
"url": "https://hackaday.io/aspamuser",
|
||||||
|
"username": "aspamuser",
|
||||||
|
"what_i_have_done": "",
|
||||||
|
"what_i_would_like_to_do": "",
|
||||||
|
"who_am_i": ""
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
or sometimes the account is benign like this:
|
||||||
|
|
||||||
|
```
|
||||||
|
{
|
||||||
|
"about_me": "how to hack into someones snapchat",
|
||||||
|
"created": 1515199252,
|
||||||
|
"followers": 1,
|
||||||
|
"following": 1,
|
||||||
|
"id": 12345678,
|
||||||
|
"image_url": "https://cdn.hackaday.io/images/default-avatar.png",
|
||||||
|
"location": "",
|
||||||
|
"projects": 0,
|
||||||
|
"rank": 1000000,
|
||||||
|
"screen_name": "aspamuser",
|
||||||
|
"skulls": 0,
|
||||||
|
"tags": null,
|
||||||
|
"url": "https://hackaday.io/aspamuser",
|
||||||
|
"username": "aspamuser",
|
||||||
|
"what_i_have_done": "",
|
||||||
|
"what_i_would_like_to_do": "",
|
||||||
|
"who_am_i": ""
|
||||||
|
}
|
||||||
|
```
|
||||||
|
… but then it has links elsewhere:
|
||||||
|
|
||||||
|
```
|
||||||
|
{
|
||||||
|
"last_page": 1,
|
||||||
|
"links": [
|
||||||
|
{
|
||||||
|
"id": 12345678,
|
||||||
|
"title": "how to hack into someones snapchat",
|
||||||
|
"type": "other",
|
||||||
|
"url": "https://example.com/"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"page": 1,
|
||||||
|
"per_page": 1,
|
||||||
|
"total": 1
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
Based on this, the `about_me`, `who_am_i` and links are definite places we can
|
||||||
|
be looking to identify such users.
|
||||||
|
|
||||||
|
The first step will be to grab the information from the API and cache it
|
||||||
|
temporarily, probably in RAM since we don't want to keep it long-term, and pick
|
||||||
|
out those accounts that have string patterns that match URIs, telephone
|
||||||
|
numbers or physical addresses.
|
||||||
|
|
||||||
|
For the sake of not repeating ourselves, we should persistently store at least
|
||||||
|
the profile IDs of users we have "seen" already, as there's a good chance of false
|
||||||
|
positives in that.
|
||||||
|
|
||||||
|
A human can then decide whether the user is genuine or not, and the record
|
||||||
|
updated accordingly, if not genuine, they can then proceed to the profile page
|
||||||
|
to report the user. This will likely require oAuth authentication and require
|
||||||
|
the user to be "joined" to this project.
|
||||||
|
|
||||||
|
What this project is not
|
||||||
|
========================
|
||||||
|
|
||||||
|
- We won't be "automatically" banning users or filing spam reports in any sort
|
||||||
|
of automated fashion.
|
||||||
|
- We will *not* be undertaking in any vigilante action: the aim here is to
|
||||||
|
identify the accounts so they can be removed. If SupplyFrame decide to take
|
||||||
|
action against the business concerned, that is their decision to make, not
|
||||||
|
ours.
|
Reference in New Issue
Block a user