Proxy management
IP address blocking is one of the oldest and most effective ways of preventing access to a website. It is therefore paramount for a good web scraping library to provide easy to use but powerful tools which can work around IP blocking. The most powerful weapon in your anti IP blocking arsenal is a proxy server.
With the Apify SDK, you can use your own proxy servers, proxy servers acquired from third-party providers, or you can rely on Apify Proxy for your scraping needs.
Quick start
If you want to use Apify Proxy locally, make sure that you run your Actors via the Apify CLI and that you are logged in with your Apify account in the CLI.
Using Apify proxy
from apify import Actor
async def main() -> None:
    async with Actor:
        proxy_configuration = await Actor.create_proxy_configuration()
        if not proxy_configuration:
            raise RuntimeError('No proxy configuration available.')
        proxy_url = await proxy_configuration.new_url()
        Actor.log.info(f'Using proxy URL: {proxy_url}')
Using your own proxies
from apify import Actor
async def main() -> None:
    async with Actor:
        proxy_configuration = await Actor.create_proxy_configuration(
            proxy_urls=[
                'http://proxy-1.com',
                'http://proxy-2.com',
            ],
        )
        if not proxy_configuration:
            raise RuntimeError('No proxy configuration available.')
        proxy_url = await proxy_configuration.new_url()
        Actor.log.info(f'Using proxy URL: {proxy_url}')
Proxy configuration
All your proxy needs are managed by the ProxyConfiguration class. You create an instance using the Actor.create_proxy_configuration() method. Then you generate proxy URLs using the ProxyConfiguration.new_url() method.
Apify proxy vs. your own proxies
The ProxyConfiguration class covers both Apify Proxy and custom proxy URLs, so that you can easily switch between proxy providers. However, some features of the class are available only to Apify Proxy users, mainly because Apify Proxy is what one would call a super-proxy. It's not a single proxy server, but an API endpoint that allows connectionthrough millions of different IP addresses. So the class essentially has two modes: Apify Proxy or Your proxy.
The difference is easy to remember. Using the proxy_url or new_url_function arguments enables use of your custom proxy URLs, whereas all the other options are there to configure Apify Proxy. Visit the Apify Proxy docs for more info on how these parameters work.
IP rotation and session management
ProxyConfiguration.new_url allows you to pass a session_id parameter. It will then be used to create a session_id-proxy_url pair, and subsequent new_url() calls with the same session_id will always return the same proxy_url. This is extremely useful in scraping, because you want to create the impression of a real user.
When no session_id is provided, your custom proxy URLs are rotated round-robin, whereas Apify Proxy manages their rotation using black magic to get the best performance.
from apify import Actor
async def main() -> None:
    async with Actor:
        proxy_configuration = await Actor.create_proxy_configuration(
            proxy_urls=[
                'http://proxy-1.com',
                'http://proxy-2.com',
            ],
        )
        if not proxy_configuration:
            raise RuntimeError('No proxy configuration available.')
        proxy_url = await proxy_configuration.new_url()  # http://proxy-1.com
        proxy_url = await proxy_configuration.new_url()  # http://proxy-2.com
        proxy_url = await proxy_configuration.new_url()  # http://proxy-1.com
        proxy_url = await proxy_configuration.new_url()  # http://proxy-2.com
        proxy_url = await proxy_configuration.new_url(
            session_id='a'
        )  # http://proxy-1.com
        proxy_url = await proxy_configuration.new_url(
            session_id='b'
        )  # http://proxy-2.com
        proxy_url = await proxy_configuration.new_url(
            session_id='b'
        )  # http://proxy-2.com
        proxy_url = await proxy_configuration.new_url(
            session_id='a'
        )  # http://proxy-1.com
Apify proxy configuration
With Apify Proxy, you can select specific proxy groups to use, or countries to connect from. This allows you to get better proxy performance after some initial research.
from apify import Actor
async def main() -> None:
    async with Actor:
        proxy_configuration = await Actor.create_proxy_configuration(
            groups=['RESIDENTIAL'],
            country_code='US',
        )
        if not proxy_configuration:
            raise RuntimeError('No proxy configuration available.')
        proxy_url = await proxy_configuration.new_url()
        Actor.log.info(f'Proxy URL: {proxy_url}')
Now your connections using proxy_url will use only Residential proxies from the US. Note that you must first get access to a proxy group before you are able to use it. You can find your available proxy groups in the proxy dashboard.
If you don't specify any proxy groups, automatic proxy selection will be used.
Your own proxy configuration
There are two options how to make ProxyConfiguration work with your own proxies.
Either you can pass it a list of your own proxy servers:
from apify import Actor
async def main() -> None:
    async with Actor:
        proxy_configuration = await Actor.create_proxy_configuration(
            proxy_urls=[
                'http://proxy-1.com',
                'http://proxy-2.com',
            ],
        )
        if not proxy_configuration:
            raise RuntimeError('No proxy configuration available.')
        proxy_url = await proxy_configuration.new_url()
        Actor.log.info(f'Using proxy URL: {proxy_url}')
Or you can pass it a method (accepting one optional argument, the session ID), to generate proxy URLs automatically:
from __future__ import annotations
from apify import Actor, Request
async def custom_new_url_function(
    session_id: str | None = None,
    _: Request | None = None,
) -> str | None:
    if session_id is not None:
        return f'http://my-custom-proxy-supporting-sessions.com?session-id={session_id}'
    return 'http://my-custom-proxy-not-supporting-sessions.com'
async def main() -> None:
    async with Actor:
        proxy_configuration = await Actor.create_proxy_configuration(
            new_url_function=custom_new_url_function,  # type: ignore[arg-type]
        )
        if not proxy_configuration:
            raise RuntimeError('No proxy configuration available.')
        proxy_url_with_session = await proxy_configuration.new_url('a')
        Actor.log.info(f'Using proxy URL: {proxy_url_with_session}')
        proxy_url_without_session = await proxy_configuration.new_url()
        Actor.log.info(f'Using proxy URL: {proxy_url_without_session}')
Configuring proxy based on Actor input
To make selecting the proxies that the Actor uses easier, you can use an input field with the editor proxy in your input schema. This input will then be filled with a dictionary containing the proxy settings you or the users of your Actor selected for the Actor run.
You can then use that input to create the proxy configuration:
from apify import Actor
async def main() -> None:
    async with Actor:
        actor_input = await Actor.get_input() or {}
        proxy_settings = actor_input.get('proxySettings')
        proxy_configuration = await Actor.create_proxy_configuration(
            actor_proxy_input=proxy_settings
        )
        if not proxy_configuration:
            raise RuntimeError('No proxy configuration available.')
        proxy_url = await proxy_configuration.new_url()
        Actor.log.info(f'Using proxy URL: {proxy_url}')
Using the generated proxy URLs
HTTPX
To use the generated proxy URLs with the httpx library, use the proxies argument:
import httpx
from apify import Actor
async def main() -> None:
    async with Actor:
        proxy_configuration = await Actor.create_proxy_configuration(
            proxy_urls=[
                'http://proxy-1.com',
                'http://proxy-2.com',
            ],
        )
        if not proxy_configuration:
            raise RuntimeError('No proxy configuration available.')
        proxy_url = await proxy_configuration.new_url()
        async with httpx.AsyncClient(proxy=proxy_url) as httpx_client:
            response = await httpx_client.get('http://example.com')
            Actor.log.info(f'Response: {response}')
Make sure you have the httpx library installed:
pip install httpx