Coroutines are like functions, but they can be suspended or resumed at certain points in the code. To make it simple, there are two things you need to know about : coroutines and event loops. If you want to know more about, I invite you to read its documentation. Instead, I'll explain what you need to know to write asynchronous code with it. It's quite complex and I won't go too much into details. You can also get it from pypi on python 3.3. Basics of asyncioĪsyncio is the asynchronous IO library that was introduced in python 3.4. I use it to write small scraper that are really fast, and I'll show you how. In this blog post, I'll present you an alternative to requests based on the new asyncio library : aiohttp. I like the do-it-yourself approach because it's flexible, but it's not well-suited for massive data extraction, because requests does requests synchronously, and many requests means you have to wait a long time. The reason for this diversity is that "scraping" actually covers multiple problems: you don't need to same tool to extract data from hundreds of pages and to automate some web workflow (like filling a few forms and getting some data back). Do-it-yourself solutions are also popular: one can go a long way by using requests and beautifulsoup or pyquery. There are fully fledged frameworks like scrapy and more lightweight libraries like mechanize. There are many ways to do this, and there doesn't seem to be one best way. Web scraping is one of those subjects that often appears in python discussions.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |