From 48e7ec74c5ec0f67254fd8c6a9ae63bbb8ee77f3 Mon Sep 17 00:00:00 2001 From: Stuart Longland Date: Sat, 6 Jan 2018 13:16:55 +1000 Subject: [PATCH] Add in initial README and license. BSD licensed for now. We'll see where this takes us first. --- LICENSE | 23 +++++++++++ README.md | 113 ++++++++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 136 insertions(+) create mode 100644 LICENSE create mode 100644 README.md diff --git a/LICENSE b/LICENSE new file mode 100644 index 0000000..d3f871c --- /dev/null +++ b/LICENSE @@ -0,0 +1,23 @@ +Copyright (c) 2018, Stuart Longland +All rights reserved. + +Redistribution and use in source and binary forms, with or without +modification, are permitted provided that the following conditions are met: + +1. Redistributions of source code must retain the above copyright notice, + this list of conditions and the following disclaimer. +2. Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + +THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" +AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE +LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR +CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF +SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS +INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN +CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) +ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE +POSSIBILITY OF SUCH DAMAGE. diff --git a/README.md b/README.md new file mode 100644 index 0000000..5893376 --- /dev/null +++ b/README.md @@ -0,0 +1,113 @@ +Hackaday.io Spam Hunter Project +=============================== + +The aim of this project is to produce tools that aid in the detection of +spambot user accounts, intended to do little more than spruik some business. + +Most of these accounts share common traits that are fairly rudimentary: + +- They may feature an avatar with the logo of the company being advertised, + lots of flat areas of colour, etc. +- They almost certainly give a web address of the business concerned, sometimes + a phone number or physical address. Few *real* users do the latter two. +- They often have *followed* a good dozen or more projects in the few minutes + they have been registered. +- If they publish projects or pages; this content shares the same traits and + is often posted much faster than the typical human would be able to type. + +How this will work +================== + +We begin by looking at the full list of users which can be retrieved via the +[users API endpoint](https://dev.hackaday.io/doc/api/get-users). For the sorts of users we want to target, it looks something like this: + +``` +{ + "about_me": "example.com", + "created": 1515198877, + "followers": 1, + "following": 1, + "id": 123456789, + "image_url": "https://cdn.hackaday.io/images/default-avatar.png", + "location": "", + "projects": 0, + "rank": 1000000, + "screen_name": "aspamuser", + "skulls": 0, + "tags": null, + "url": "https://hackaday.io/aspamuser", + "username": "aspamuser", + "what_i_have_done": "", + "what_i_would_like_to_do": "", + "who_am_i": "" +} +``` + +or sometimes the account is benign like this: + +``` +{ + "about_me": "how to hack into someones snapchat", + "created": 1515199252, + "followers": 1, + "following": 1, + "id": 12345678, + "image_url": "https://cdn.hackaday.io/images/default-avatar.png", + "location": "", + "projects": 0, + "rank": 1000000, + "screen_name": "aspamuser", + "skulls": 0, + "tags": null, + "url": "https://hackaday.io/aspamuser", + "username": "aspamuser", + "what_i_have_done": "", + "what_i_would_like_to_do": "", + "who_am_i": "" +} +``` +… but then it has links elsewhere: + +``` +{ + "last_page": 1, + "links": [ + { + "id": 12345678, + "title": "how to hack into someones snapchat", + "type": "other", + "url": "https://example.com/" + } + ], + "page": 1, + "per_page": 1, + "total": 1 +} +``` + +Based on this, the `about_me`, `who_am_i` and links are definite places we can +be looking to identify such users. + +The first step will be to grab the information from the API and cache it +temporarily, probably in RAM since we don't want to keep it long-term, and pick +out those accounts that have string patterns that match URIs, telephone +numbers or physical addresses. + +For the sake of not repeating ourselves, we should persistently store at least +the profile IDs of users we have "seen" already, as there's a good chance of false +positives in that. + +A human can then decide whether the user is genuine or not, and the record +updated accordingly, if not genuine, they can then proceed to the profile page +to report the user. This will likely require oAuth authentication and require +the user to be "joined" to this project. + +What this project is not +======================== + +- We won't be "automatically" banning users or filing spam reports in any sort + of automated fashion. +- We will *not* be undertaking in any vigilante action: the aim here is to + identify the accounts so they can be removed. If SupplyFrame decide to take + action against the business concerned, that is their decision to make, not + ours.