mirror of
https://github.com/sjlongland/tornado-news.git
synced 2025-09-13 10:03:14 +10:00
Initial check-in of Tornado News
This commit is contained in:
commit
10f2c56945
340
COPYING
Normal file
340
COPYING
Normal file
@ -0,0 +1,340 @@
|
||||
GNU GENERAL PUBLIC LICENSE
|
||||
Version 2, June 1991
|
||||
|
||||
Copyright (C) 1989, 1991 Free Software Foundation, Inc.
|
||||
51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
|
||||
Everyone is permitted to copy and distribute verbatim copies
|
||||
of this license document, but changing it is not allowed.
|
||||
|
||||
Preamble
|
||||
|
||||
The licenses for most software are designed to take away your
|
||||
freedom to share and change it. By contrast, the GNU General Public
|
||||
License is intended to guarantee your freedom to share and change free
|
||||
software--to make sure the software is free for all its users. This
|
||||
General Public License applies to most of the Free Software
|
||||
Foundation's software and to any other program whose authors commit to
|
||||
using it. (Some other Free Software Foundation software is covered by
|
||||
the GNU Library General Public License instead.) You can apply it to
|
||||
your programs, too.
|
||||
|
||||
When we speak of free software, we are referring to freedom, not
|
||||
price. Our General Public Licenses are designed to make sure that you
|
||||
have the freedom to distribute copies of free software (and charge for
|
||||
this service if you wish), that you receive source code or can get it
|
||||
if you want it, that you can change the software or use pieces of it
|
||||
in new free programs; and that you know you can do these things.
|
||||
|
||||
To protect your rights, we need to make restrictions that forbid
|
||||
anyone to deny you these rights or to ask you to surrender the rights.
|
||||
These restrictions translate to certain responsibilities for you if you
|
||||
distribute copies of the software, or if you modify it.
|
||||
|
||||
For example, if you distribute copies of such a program, whether
|
||||
gratis or for a fee, you must give the recipients all the rights that
|
||||
you have. You must make sure that they, too, receive or can get the
|
||||
source code. And you must show them these terms so they know their
|
||||
rights.
|
||||
|
||||
We protect your rights with two steps: (1) copyright the software, and
|
||||
(2) offer you this license which gives you legal permission to copy,
|
||||
distribute and/or modify the software.
|
||||
|
||||
Also, for each author's protection and ours, we want to make certain
|
||||
that everyone understands that there is no warranty for this free
|
||||
software. If the software is modified by someone else and passed on, we
|
||||
want its recipients to know that what they have is not the original, so
|
||||
that any problems introduced by others will not reflect on the original
|
||||
authors' reputations.
|
||||
|
||||
Finally, any free program is threatened constantly by software
|
||||
patents. We wish to avoid the danger that redistributors of a free
|
||||
program will individually obtain patent licenses, in effect making the
|
||||
program proprietary. To prevent this, we have made it clear that any
|
||||
patent must be licensed for everyone's free use or not licensed at all.
|
||||
|
||||
The precise terms and conditions for copying, distribution and
|
||||
modification follow.
|
||||
|
||||
GNU GENERAL PUBLIC LICENSE
|
||||
TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION
|
||||
|
||||
0. This License applies to any program or other work which contains
|
||||
a notice placed by the copyright holder saying it may be distributed
|
||||
under the terms of this General Public License. The "Program", below,
|
||||
refers to any such program or work, and a "work based on the Program"
|
||||
means either the Program or any derivative work under copyright law:
|
||||
that is to say, a work containing the Program or a portion of it,
|
||||
either verbatim or with modifications and/or translated into another
|
||||
language. (Hereinafter, translation is included without limitation in
|
||||
the term "modification".) Each licensee is addressed as "you".
|
||||
|
||||
Activities other than copying, distribution and modification are not
|
||||
covered by this License; they are outside its scope. The act of
|
||||
running the Program is not restricted, and the output from the Program
|
||||
is covered only if its contents constitute a work based on the
|
||||
Program (independent of having been made by running the Program).
|
||||
Whether that is true depends on what the Program does.
|
||||
|
||||
1. You may copy and distribute verbatim copies of the Program's
|
||||
source code as you receive it, in any medium, provided that you
|
||||
conspicuously and appropriately publish on each copy an appropriate
|
||||
copyright notice and disclaimer of warranty; keep intact all the
|
||||
notices that refer to this License and to the absence of any warranty;
|
||||
and give any other recipients of the Program a copy of this License
|
||||
along with the Program.
|
||||
|
||||
You may charge a fee for the physical act of transferring a copy, and
|
||||
you may at your option offer warranty protection in exchange for a fee.
|
||||
|
||||
2. You may modify your copy or copies of the Program or any portion
|
||||
of it, thus forming a work based on the Program, and copy and
|
||||
distribute such modifications or work under the terms of Section 1
|
||||
above, provided that you also meet all of these conditions:
|
||||
|
||||
a) You must cause the modified files to carry prominent notices
|
||||
stating that you changed the files and the date of any change.
|
||||
|
||||
b) You must cause any work that you distribute or publish, that in
|
||||
whole or in part contains or is derived from the Program or any
|
||||
part thereof, to be licensed as a whole at no charge to all third
|
||||
parties under the terms of this License.
|
||||
|
||||
c) If the modified program normally reads commands interactively
|
||||
when run, you must cause it, when started running for such
|
||||
interactive use in the most ordinary way, to print or display an
|
||||
announcement including an appropriate copyright notice and a
|
||||
notice that there is no warranty (or else, saying that you provide
|
||||
a warranty) and that users may redistribute the program under
|
||||
these conditions, and telling the user how to view a copy of this
|
||||
License. (Exception: if the Program itself is interactive but
|
||||
does not normally print such an announcement, your work based on
|
||||
the Program is not required to print an announcement.)
|
||||
|
||||
These requirements apply to the modified work as a whole. If
|
||||
identifiable sections of that work are not derived from the Program,
|
||||
and can be reasonably considered independent and separate works in
|
||||
themselves, then this License, and its terms, do not apply to those
|
||||
sections when you distribute them as separate works. But when you
|
||||
distribute the same sections as part of a whole which is a work based
|
||||
on the Program, the distribution of the whole must be on the terms of
|
||||
this License, whose permissions for other licensees extend to the
|
||||
entire whole, and thus to each and every part regardless of who wrote it.
|
||||
|
||||
Thus, it is not the intent of this section to claim rights or contest
|
||||
your rights to work written entirely by you; rather, the intent is to
|
||||
exercise the right to control the distribution of derivative or
|
||||
collective works based on the Program.
|
||||
|
||||
In addition, mere aggregation of another work not based on the Program
|
||||
with the Program (or with a work based on the Program) on a volume of
|
||||
a storage or distribution medium does not bring the other work under
|
||||
the scope of this License.
|
||||
|
||||
3. You may copy and distribute the Program (or a work based on it,
|
||||
under Section 2) in object code or executable form under the terms of
|
||||
Sections 1 and 2 above provided that you also do one of the following:
|
||||
|
||||
a) Accompany it with the complete corresponding machine-readable
|
||||
source code, which must be distributed under the terms of Sections
|
||||
1 and 2 above on a medium customarily used for software interchange; or,
|
||||
|
||||
b) Accompany it with a written offer, valid for at least three
|
||||
years, to give any third party, for a charge no more than your
|
||||
cost of physically performing source distribution, a complete
|
||||
machine-readable copy of the corresponding source code, to be
|
||||
distributed under the terms of Sections 1 and 2 above on a medium
|
||||
customarily used for software interchange; or,
|
||||
|
||||
c) Accompany it with the information you received as to the offer
|
||||
to distribute corresponding source code. (This alternative is
|
||||
allowed only for noncommercial distribution and only if you
|
||||
received the program in object code or executable form with such
|
||||
an offer, in accord with Subsection b above.)
|
||||
|
||||
The source code for a work means the preferred form of the work for
|
||||
making modifications to it. For an executable work, complete source
|
||||
code means all the source code for all modules it contains, plus any
|
||||
associated interface definition files, plus the scripts used to
|
||||
control compilation and installation of the executable. However, as a
|
||||
special exception, the source code distributed need not include
|
||||
anything that is normally distributed (in either source or binary
|
||||
form) with the major components (compiler, kernel, and so on) of the
|
||||
operating system on which the executable runs, unless that component
|
||||
itself accompanies the executable.
|
||||
|
||||
If distribution of executable or object code is made by offering
|
||||
access to copy from a designated place, then offering equivalent
|
||||
access to copy the source code from the same place counts as
|
||||
distribution of the source code, even though third parties are not
|
||||
compelled to copy the source along with the object code.
|
||||
|
||||
4. You may not copy, modify, sublicense, or distribute the Program
|
||||
except as expressly provided under this License. Any attempt
|
||||
otherwise to copy, modify, sublicense or distribute the Program is
|
||||
void, and will automatically terminate your rights under this License.
|
||||
However, parties who have received copies, or rights, from you under
|
||||
this License will not have their licenses terminated so long as such
|
||||
parties remain in full compliance.
|
||||
|
||||
5. You are not required to accept this License, since you have not
|
||||
signed it. However, nothing else grants you permission to modify or
|
||||
distribute the Program or its derivative works. These actions are
|
||||
prohibited by law if you do not accept this License. Therefore, by
|
||||
modifying or distributing the Program (or any work based on the
|
||||
Program), you indicate your acceptance of this License to do so, and
|
||||
all its terms and conditions for copying, distributing or modifying
|
||||
the Program or works based on it.
|
||||
|
||||
6. Each time you redistribute the Program (or any work based on the
|
||||
Program), the recipient automatically receives a license from the
|
||||
original licensor to copy, distribute or modify the Program subject to
|
||||
these terms and conditions. You may not impose any further
|
||||
restrictions on the recipients' exercise of the rights granted herein.
|
||||
You are not responsible for enforcing compliance by third parties to
|
||||
this License.
|
||||
|
||||
7. If, as a consequence of a court judgment or allegation of patent
|
||||
infringement or for any other reason (not limited to patent issues),
|
||||
conditions are imposed on you (whether by court order, agreement or
|
||||
otherwise) that contradict the conditions of this License, they do not
|
||||
excuse you from the conditions of this License. If you cannot
|
||||
distribute so as to satisfy simultaneously your obligations under this
|
||||
License and any other pertinent obligations, then as a consequence you
|
||||
may not distribute the Program at all. For example, if a patent
|
||||
license would not permit royalty-free redistribution of the Program by
|
||||
all those who receive copies directly or indirectly through you, then
|
||||
the only way you could satisfy both it and this License would be to
|
||||
refrain entirely from distribution of the Program.
|
||||
|
||||
If any portion of this section is held invalid or unenforceable under
|
||||
any particular circumstance, the balance of the section is intended to
|
||||
apply and the section as a whole is intended to apply in other
|
||||
circumstances.
|
||||
|
||||
It is not the purpose of this section to induce you to infringe any
|
||||
patents or other property right claims or to contest validity of any
|
||||
such claims; this section has the sole purpose of protecting the
|
||||
integrity of the free software distribution system, which is
|
||||
implemented by public license practices. Many people have made
|
||||
generous contributions to the wide range of software distributed
|
||||
through that system in reliance on consistent application of that
|
||||
system; it is up to the author/donor to decide if he or she is willing
|
||||
to distribute software through any other system and a licensee cannot
|
||||
impose that choice.
|
||||
|
||||
This section is intended to make thoroughly clear what is believed to
|
||||
be a consequence of the rest of this License.
|
||||
|
||||
8. If the distribution and/or use of the Program is restricted in
|
||||
certain countries either by patents or by copyrighted interfaces, the
|
||||
original copyright holder who places the Program under this License
|
||||
may add an explicit geographical distribution limitation excluding
|
||||
those countries, so that distribution is permitted only in or among
|
||||
countries not thus excluded. In such case, this License incorporates
|
||||
the limitation as if written in the body of this License.
|
||||
|
||||
9. The Free Software Foundation may publish revised and/or new versions
|
||||
of the General Public License from time to time. Such new versions will
|
||||
be similar in spirit to the present version, but may differ in detail to
|
||||
address new problems or concerns.
|
||||
|
||||
Each version is given a distinguishing version number. If the Program
|
||||
specifies a version number of this License which applies to it and "any
|
||||
later version", you have the option of following the terms and conditions
|
||||
either of that version or of any later version published by the Free
|
||||
Software Foundation. If the Program does not specify a version number of
|
||||
this License, you may choose any version ever published by the Free Software
|
||||
Foundation.
|
||||
|
||||
10. If you wish to incorporate parts of the Program into other free
|
||||
programs whose distribution conditions are different, write to the author
|
||||
to ask for permission. For software which is copyrighted by the Free
|
||||
Software Foundation, write to the Free Software Foundation; we sometimes
|
||||
make exceptions for this. Our decision will be guided by the two goals
|
||||
of preserving the free status of all derivatives of our free software and
|
||||
of promoting the sharing and reuse of software generally.
|
||||
|
||||
NO WARRANTY
|
||||
|
||||
11. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY
|
||||
FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN
|
||||
OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES
|
||||
PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED
|
||||
OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
|
||||
MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS
|
||||
TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE
|
||||
PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING,
|
||||
REPAIR OR CORRECTION.
|
||||
|
||||
12. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING
|
||||
WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR
|
||||
REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES,
|
||||
INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING
|
||||
OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED
|
||||
TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY
|
||||
YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER
|
||||
PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE
|
||||
POSSIBILITY OF SUCH DAMAGES.
|
||||
|
||||
END OF TERMS AND CONDITIONS
|
||||
|
||||
How to Apply These Terms to Your New Programs
|
||||
|
||||
If you develop a new program, and you want it to be of the greatest
|
||||
possible use to the public, the best way to achieve this is to make it
|
||||
free software which everyone can redistribute and change under these terms.
|
||||
|
||||
To do so, attach the following notices to the program. It is safest
|
||||
to attach them to the start of each source file to most effectively
|
||||
convey the exclusion of warranty; and each file should have at least
|
||||
the "copyright" line and a pointer to where the full notice is found.
|
||||
|
||||
<one line to give the program's name and a brief idea of what it does.>
|
||||
Copyright (C) <year> <name of author>
|
||||
|
||||
This program is free software; you can redistribute it and/or modify
|
||||
it under the terms of the GNU General Public License as published by
|
||||
the Free Software Foundation; either version 2 of the License, or
|
||||
(at your option) any later version.
|
||||
|
||||
This program is distributed in the hope that it will be useful,
|
||||
but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
||||
GNU General Public License for more details.
|
||||
|
||||
You should have received a copy of the GNU General Public License
|
||||
along with this program; if not, write to the Free Software
|
||||
Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
|
||||
|
||||
|
||||
Also add information on how to contact you by electronic and paper mail.
|
||||
|
||||
If the program is interactive, make it output a short notice like this
|
||||
when it starts in an interactive mode:
|
||||
|
||||
Gnomovision version 69, Copyright (C) year name of author
|
||||
Gnomovision comes with ABSOLUTELY NO WARRANTY; for details type `show w'.
|
||||
This is free software, and you are welcome to redistribute it
|
||||
under certain conditions; type `show c' for details.
|
||||
|
||||
The hypothetical commands `show w' and `show c' should show the appropriate
|
||||
parts of the General Public License. Of course, the commands you use may
|
||||
be called something other than `show w' and `show c'; they could even be
|
||||
mouse-clicks or menu items--whatever suits your program.
|
||||
|
||||
You should also get your employer (if you work as a programmer) or your
|
||||
school, if any, to sign a "copyright disclaimer" for the program, if
|
||||
necessary. Here is a sample; alter the names:
|
||||
|
||||
Yoyodyne, Inc., hereby disclaims all copyright interest in the program
|
||||
`Gnomovision' (which makes passes at compilers) written by James Hacker.
|
||||
|
||||
<signature of Ty Coon>, 1 April 1989
|
||||
Ty Coon, President of Vice
|
||||
|
||||
This General Public License does not permit incorporating your program into
|
||||
proprietary programs. If your program is a subroutine library, you may
|
||||
consider it more useful to permit linking proprietary applications with the
|
||||
library. If this is what you want to do, use the GNU Library General
|
||||
Public License instead of this License.
|
55
example/config.yml
Normal file
55
example/config.yml
Normal file
@ -0,0 +1,55 @@
|
||||
# Tornado news reader/aggregator configuration file.
|
||||
# Feed metadata
|
||||
meta:
|
||||
title: My News Feed
|
||||
link: http://example.com/news/
|
||||
description: A private aggregation of news articles
|
||||
language: en
|
||||
|
||||
owner:
|
||||
name: John Smith
|
||||
email: jsmith@example.com
|
||||
|
||||
cache: /tmp/cache
|
||||
|
||||
output:
|
||||
html: /var/www/localhost/index.html
|
||||
rss: /var/www/localhost/feed.rss
|
||||
|
||||
# A basic template for the website
|
||||
html_templates:
|
||||
base: >
|
||||
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
|
||||
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
|
||||
<html xmlns="http://www.w3.org/1999/xhtml">
|
||||
<head>
|
||||
<meta http-equiv="Content-Type"
|
||||
content="text/html; charset=utf-8" />
|
||||
<title>{{title}}</title>
|
||||
<link rel="stylesheet" type="text/css" href="style.css" />
|
||||
<link rel="alternate" href="/feed.rss" title="" type="application/rss+xml" />
|
||||
</head>
|
||||
<body>
|
||||
<h1>{{title}}</h1>
|
||||
<hr />
|
||||
{% raw entries %}
|
||||
</body>
|
||||
</html>
|
||||
entry: >
|
||||
<div class="entry border">
|
||||
<div class="entry header">
|
||||
<h1 class="entry header title"><a href="{{link}}" name="{{anchor}}">[{{source}}] {{title}}</a></h1>
|
||||
<p class="entry header meta">
|
||||
<span class="entry header author">{{author}}</span>
|
||||
<span class="entry header timestamp">{{updated_str}}</span>
|
||||
</p>
|
||||
</div>
|
||||
<div class="entry body">
|
||||
{% raw content %}
|
||||
</div>
|
||||
</div>
|
||||
|
||||
# Feed sources, have as many as you like
|
||||
sources:
|
||||
- name: Tornado
|
||||
url: https://groups.google.com/forum/feed/python-tornado-announce/msgs/rss.xml?num=15
|
15
example/style.css
Normal file
15
example/style.css
Normal file
@ -0,0 +1,15 @@
|
||||
body {
|
||||
background-color: #000;
|
||||
color: #fff;
|
||||
};
|
||||
|
||||
a {
|
||||
color: #00f;
|
||||
};
|
||||
|
||||
div.entry.border {
|
||||
margin: 3em;
|
||||
border: 1px solid #fff;
|
||||
background-color: #333;
|
||||
color: #fff;
|
||||
}
|
16
setup.py
Normal file
16
setup.py
Normal file
@ -0,0 +1,16 @@
|
||||
#!/usr/bin/python
|
||||
from setuptools import setup
|
||||
from tornadonews import __version__
|
||||
|
||||
setup(name = 'tornadonews',
|
||||
version = __version__,
|
||||
description = 'Tornado News: a simple news aggregator',
|
||||
packages = [
|
||||
'tornadonews',
|
||||
],
|
||||
entry_points = {
|
||||
'console_scripts': [
|
||||
'tornadonews=tornadonews.tornadonews:main',
|
||||
],
|
||||
},
|
||||
)
|
1
tornadonews/__init__.py
Normal file
1
tornadonews/__init__.py
Normal file
@ -0,0 +1 @@
|
||||
__version__ = '0.0.1'
|
398
tornadonews/tornadonews.py
Normal file
398
tornadonews/tornadonews.py
Normal file
@ -0,0 +1,398 @@
|
||||
#!/usr/bin/env python
|
||||
|
||||
"""
|
||||
TornadoNews: Simple RSS/Atom aggregator written in Tornado.
|
||||
|
||||
This program is free software; you can redistribute it and/or modify
|
||||
it under the terms of the GNU General Public License as published by
|
||||
the Free Software Foundation; either version 2 of the License, or
|
||||
(at your option) any later version.
|
||||
|
||||
This program is distributed in the hope that it will be useful,
|
||||
but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
||||
GNU General Public License for more details.
|
||||
|
||||
You should have received a copy of the GNU General Public License
|
||||
along with this program; if not, write to the Free Software
|
||||
Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
|
||||
"""
|
||||
|
||||
from tornado.template import Template
|
||||
from tornado.httpclient import AsyncHTTPClient
|
||||
from tornado.ioloop import IOLoop
|
||||
import feedparser
|
||||
import yaml
|
||||
import argparse
|
||||
from calendar import timegm
|
||||
from functools import partial
|
||||
import logging
|
||||
from multiprocessing.pool import ThreadPool
|
||||
from multiprocessing import cpu_count
|
||||
from os import makedirs, path, stat
|
||||
from hashlib import sha1
|
||||
import feedgenerator
|
||||
from six import StringIO, text_type, binary_type, PY3
|
||||
import datetime
|
||||
|
||||
if PY3:
|
||||
# Doesn't seem to work in Python 3. Until further notice, let's
|
||||
# disable this for now.
|
||||
# http://blog.yjl.im/2013/12/workaround-of-libxml2-unsupported.html
|
||||
feedparser.PREFERRED_XML_PARSERS.remove('drv_libxml2')
|
||||
|
||||
|
||||
class FeedEntry(object):
|
||||
"""
|
||||
FeedEntry: a simple object for storing the specifics of an article
|
||||
so that they can be grouped and sorted.
|
||||
"""
|
||||
|
||||
def __init__(self, source, entry_id, link, title, author,
|
||||
updated, content):
|
||||
"""
|
||||
Construct a new FeedEntry object.
|
||||
"""
|
||||
|
||||
self._source = text_type(source)
|
||||
self._id = text_type(entry_id)
|
||||
self._author = text_type(author)
|
||||
self._anchor = None
|
||||
self._link = text_type(link)
|
||||
self._title = text_type(title)
|
||||
self._updated = float(updated)
|
||||
self._content = text_type(content)
|
||||
|
||||
@classmethod
|
||||
def from_entry(cls, source, entry):
|
||||
"""
|
||||
Parse the feedparser-generated entry dict and return a FeedEntry
|
||||
object from it.
|
||||
"""
|
||||
return cls(
|
||||
source, entry['id'], entry['link'], entry['title'],
|
||||
entry.get('author') or 'Anonymous',
|
||||
timegm(entry.get('updated_parsed') or \
|
||||
entry['published_parsed']),
|
||||
entry.get('content') or entry['summary'])
|
||||
|
||||
@property
|
||||
def raw(self):
|
||||
"""
|
||||
Dump the feed entry so that it can be serialised safely and
|
||||
later returned.
|
||||
"""
|
||||
return {
|
||||
'source': self.source,
|
||||
'entry_id': self.id,
|
||||
'link': self.link,
|
||||
'title': self.title,
|
||||
'author': self.author,
|
||||
'updated': self.updated,
|
||||
'content': self.content,
|
||||
}
|
||||
|
||||
@property
|
||||
def source(self):
|
||||
return self._source
|
||||
|
||||
@property
|
||||
def id(self):
|
||||
return self._id
|
||||
|
||||
@property
|
||||
def anchor(self):
|
||||
if self._anchor is None:
|
||||
self._anchor = sha1(self.id.encode('UTF-8')).hexdigest()
|
||||
return self._anchor
|
||||
|
||||
@property
|
||||
def title(self):
|
||||
return self._title
|
||||
|
||||
@property
|
||||
def author(self):
|
||||
return self._author
|
||||
|
||||
@property
|
||||
def link(self):
|
||||
return self._link
|
||||
|
||||
@property
|
||||
def updated(self):
|
||||
return self._updated
|
||||
|
||||
@property
|
||||
def content(self):
|
||||
return self._content
|
||||
|
||||
def __lt__(self, other):
|
||||
if not isinstance(other, FeedEntry):
|
||||
return NotImplemented
|
||||
return self.updated < other.updated
|
||||
|
||||
def __unicode__(self):
|
||||
return u'[%s] %s' % (self.source, self.title)
|
||||
|
||||
if PY3: # Python 3
|
||||
def __str__(self):
|
||||
return self.__unicode__()
|
||||
else: # Python 2
|
||||
def __str__(self):
|
||||
return self.__unicode__().encode('utf8')
|
||||
|
||||
def __repr__(self):
|
||||
return '<%s %s>' % (self.__class__.__name__, self)
|
||||
|
||||
|
||||
class FeedFetcher(object):
|
||||
"""
|
||||
FeedFetcher: A simplified RSS/Atom feed retriever and parser. This
|
||||
uses Tornado's asynchronous HTTP client to retrieve the individual
|
||||
feeds as they're fed into the object using the fetch method.
|
||||
|
||||
When the caller is done, they call collate, giving a callback function
|
||||
that receives the collated list at the end.
|
||||
"""
|
||||
|
||||
def __init__(self, cache=None, num_workers=None, io_loop=None):
|
||||
if io_loop is None:
|
||||
io_loop = IOLoop.current()
|
||||
|
||||
if num_workers is None:
|
||||
num_workers = cpu_count()
|
||||
|
||||
self._io_loop = io_loop
|
||||
self._log = logging.getLogger(self.__class__.__name__)
|
||||
self._client = AsyncHTTPClient()
|
||||
self._cache = cache
|
||||
self._entries = []
|
||||
self._pending = set()
|
||||
self._fetched = False
|
||||
self._pool = ThreadPool(processes=num_workers)
|
||||
self._on_done = None
|
||||
|
||||
@property
|
||||
def entries(self):
|
||||
return list(self._entries)
|
||||
|
||||
def fetch(self, name, url):
|
||||
"""
|
||||
Fetch the feed named 'name', at the address, 'url'.
|
||||
"""
|
||||
|
||||
self._log.info('Retrieving %s (%s)', name, url)
|
||||
self._fetched = False
|
||||
self._pending.add(url)
|
||||
|
||||
cache_dir = self._get_dir_for_url(url)
|
||||
if (cache_dir is not None) and path.isdir(cache_dir):
|
||||
if_modified_since=stat(path.join(cache_dir, 'body')).st_mtime
|
||||
else:
|
||||
if_modified_since=None
|
||||
|
||||
self._client.fetch(url,
|
||||
callback=partial(self._on_get_done, name,
|
||||
url, cache_dir),
|
||||
if_modified_since=if_modified_since)
|
||||
|
||||
def _get_dir_for_url(self, url):
|
||||
if self._cache is None:
|
||||
return None
|
||||
|
||||
url_hash = sha1(url.encode('UTF-8')).hexdigest()
|
||||
return path.join(
|
||||
self._cache, url_hash[0:2], url_hash[2:4], url_hash[4:])
|
||||
|
||||
def _on_get_done(self, name, url, cache_dir, response):
|
||||
self._log.info('Finished retrieving %s (%s), result %s',
|
||||
name, url, response.reason)
|
||||
try:
|
||||
if response.code == 304: # Not modified
|
||||
# Read from cache
|
||||
self._log.info('Not modified, read from cache')
|
||||
body = open(path.join(cache_dir, 'body'),'rb').read()
|
||||
cached = True
|
||||
self._log.debug('Body type: %s (cache)', type(body))
|
||||
|
||||
else:
|
||||
# Check for exceptions
|
||||
response.rethrow()
|
||||
|
||||
# Grab body data
|
||||
body = response.body
|
||||
cached = False
|
||||
self._log.debug('Body type: %s (http)', type(body))
|
||||
|
||||
# Dump to cache
|
||||
if cache_dir is not None:
|
||||
if not path.isdir(cache_dir):
|
||||
makedirs(cache_dir)
|
||||
# Write out headers
|
||||
yaml.safe_dump(dict(response.headers),
|
||||
stream=open(path.join(cache_dir,
|
||||
'headers.yml'),'w'),
|
||||
default_flow_style=False)
|
||||
# Write out raw body
|
||||
open(path.join(cache_dir, 'body'),'wb').write(body)
|
||||
|
||||
# Hand off to thread pool
|
||||
self._pool.apply_async(self._process,
|
||||
args=(name, url, body, cache_dir, cached))
|
||||
except:
|
||||
self._log.exception('Failed to process feed %s (%s)',
|
||||
name, url)
|
||||
self._mark_done(url)
|
||||
|
||||
def _process(self, name, url, body, cache_dir, cached):
|
||||
self._log.info('Processing feed %s', name)
|
||||
try:
|
||||
entries_cache = path.join(cache_dir, 'entries.yml')
|
||||
entries = None
|
||||
try:
|
||||
if cached and path.isfile(entries_cache):
|
||||
entries = list(map(lambda e : FeedEntry(**e),
|
||||
list(yaml.safe_load(
|
||||
open(entries_cache,'r')))))
|
||||
except:
|
||||
self._log.debug('Failed to read cache, ignoring',
|
||||
exc_info=1)
|
||||
cached = False
|
||||
|
||||
if entries is None:
|
||||
parsed = feedparser.parse(body)
|
||||
|
||||
# Extract the entries from the feed
|
||||
entries = list(map(partial(FeedEntry.from_entry, name),
|
||||
parsed['entries']))
|
||||
|
||||
if not cached and (cache_dir is not None):
|
||||
cache_out = yaml.safe_dump([e.raw for e in entries])
|
||||
open(entries_cache,'wb').write(cache_out.encode('UTF-8'))
|
||||
except:
|
||||
self._log.exception('Failed to process feed %s (%s)',
|
||||
name, url)
|
||||
entries = None
|
||||
|
||||
# Hand back to main loop
|
||||
self._io_loop.add_callback(self._mark_done, url, entries)
|
||||
|
||||
def _mark_done(self, url, entries=None):
|
||||
self._log.info('%s parsed', url)
|
||||
if entries is not None:
|
||||
self._entries.extend(entries)
|
||||
self._pending.discard(url)
|
||||
self._io_loop.add_callback(self._check_finished)
|
||||
|
||||
def _check_finished(self):
|
||||
self._log.debug('Fetched? %s Pending: %s',
|
||||
self._fetched, self._pending)
|
||||
if self._fetched and not bool(self._pending):
|
||||
self._io_loop.add_callback(self._emit)
|
||||
|
||||
def collate(self, callback=None):
|
||||
"""
|
||||
Wait for all feeds to be loaded, then collate the resulting
|
||||
entries together for display.
|
||||
"""
|
||||
# No more to fetch after this.
|
||||
self._fetched = True
|
||||
self._io_loop.add_callback(self._check_finished)
|
||||
self._log.info('Waiting for fetch to complete')
|
||||
self._on_done = callback
|
||||
|
||||
def _emit(self):
|
||||
self._log.info('Collating results')
|
||||
self._entries.sort(key=lambda e : e.updated, reverse=True)
|
||||
if self._on_done is not None:
|
||||
self._io_loop.add_callback(self._on_done, list(self._entries))
|
||||
|
||||
|
||||
class FeedEmitter(object):
|
||||
"""
|
||||
FeedEmitter: Simple news item aggregator and emitter. This takes a
|
||||
list of FeedEntry objects and optionally, some HTML templates. The
|
||||
make_rss and make_html methods then generate RSS or HTML from these
|
||||
feed items.
|
||||
"""
|
||||
|
||||
def __init__(self, entries, html_base=None, html_entry=None):
|
||||
self._log = logging.getLogger(self.__class__.__name__)
|
||||
self._entries = entries
|
||||
self._html_base = html_base
|
||||
self._html_entry = html_entry
|
||||
self._log.info('Emitter constructed with %d entries',
|
||||
len(entries))
|
||||
|
||||
def make_rss(self, **kwargs):
|
||||
self._log.info('Emitting RSS')
|
||||
rss = feedgenerator.Rss201rev2Feed(**kwargs)
|
||||
for entry in self._entries:
|
||||
rss.add_item(
|
||||
title=u'[%s] %s' % (entry.source, entry.title),
|
||||
link=entry.link,
|
||||
description=entry.content,
|
||||
author_name=entry.author,
|
||||
pubdate=datetime.datetime.fromtimestamp(entry.updated),
|
||||
unique_id=entry.id)
|
||||
out = StringIO()
|
||||
rss.write(out, 'utf-8')
|
||||
return out.getvalue()
|
||||
|
||||
def make_html(self, **kwargs):
|
||||
self._log.info('Emitting HTML')
|
||||
t = Template(self._html_base)
|
||||
entries = '\n'.join(list(map(self._entry_to_html, self._entries)))
|
||||
return t.generate(entries=entries, **kwargs)
|
||||
|
||||
def _entry_to_html(self, entry):
|
||||
t = Template(self._html_entry)
|
||||
return text_type(t.generate(source=entry.source,
|
||||
anchor=entry.anchor, id=entry.id,
|
||||
link=entry.link, title=entry.title,
|
||||
author=entry.author, updated=entry.updated,
|
||||
updated_str=self._date_to_str(entry.updated),
|
||||
content=entry.content), 'UTF-8')
|
||||
|
||||
def _date_to_str(self, timestamp):
|
||||
return str(datetime.datetime.fromtimestamp(timestamp))
|
||||
|
||||
|
||||
def main():
|
||||
parser = argparse.ArgumentParser(description='Feed parser/aggregator')
|
||||
parser.add_argument('config', metavar='CONFIG', type=str,
|
||||
help='Configuration file')
|
||||
args = parser.parse_args()
|
||||
cfg = yaml.load(open(args.config,'r'))
|
||||
logging.basicConfig(level=logging.DEBUG)
|
||||
|
||||
ioloop = IOLoop.current()
|
||||
ioloop.add_callback(run, cfg)
|
||||
ioloop.start()
|
||||
|
||||
def run(cfg):
|
||||
fetcher = FeedFetcher(cache=cfg.get('cache'))
|
||||
for source in cfg['sources']:
|
||||
fetcher.fetch(**source)
|
||||
|
||||
html_templates = cfg.get('html_templates',{})
|
||||
output = cfg.get('output',{})
|
||||
meta = cfg.get('meta', {})
|
||||
def emit(entries):
|
||||
try:
|
||||
emitter = FeedEmitter(entries,
|
||||
html_entry=html_templates.get('entry'),
|
||||
html_base=html_templates.get('base'))
|
||||
if output.get('html'):
|
||||
html = emitter.make_html(**meta)
|
||||
open(output['html'],'wb').write(html)
|
||||
if output.get('rss'):
|
||||
rss = emitter.make_rss(**meta)
|
||||
open(output['rss'],'w').write(rss)
|
||||
finally:
|
||||
IOLoop.current().stop()
|
||||
|
||||
fetcher.collate(emit)
|
||||
|
||||
if __name__ == '__main__':
|
||||
main()
|
Loading…
Reference in New Issue
Block a user