How At Helpchat We Scaled Celery To Send 5.6 Bn Push Messages To Mobile Devices In India

How At Helpchat We Scaled Celery To Send 5.6 Bn Push Messages To Mobile Devices In India

We use celery extensively for all our tasks queues, and have a logical division of the queues based on the type of work they do. Primarily we divide the queues in two different buckets:

First Level Division

  1. I/O based task queues (gevent)
  2. CPU bound task queues (prefork)

Second Level Division

  1. Fast running tasks
  2. Slow running task
  3. High priority tasks
  4. Low priority tasks

This helps us configure celery for maximum throughput.

Some very important learnings from using celery in the past one year.

  1. For I/O bound queues use gevent and make sure that your code does not perform any blocking calls or the whole purpose will fail. Refer: What is gevent? — gevent 1.2.0.dev0 documentation
  2. Gevent increases the throughput of our I/O bound queues considerably.

celery -A proj worker -P gevent -c 100 (example from celery docs)

  1. Use -Ofair to make sure the tasks are distributed evenly. (it comes with an overhead of coordination but the results are more predictable if tasks takes different amount of time to execute.)
  2. Use –maxtasksperchild argument to tell celery to use a particular worker for only n number of tasks, if you think there is even a slight possibility of memory leak. Celery can kill a worker and re-spawn a new one to make sure memory is released and it does not hamper the systems performance. ( I am not saying that you should not fix your code :p)
  3. Use a global timeout to make sure your workers does not get stuck. You can set soft and hard timeout and even notify the task to wrap up or log stuff before it is killed by celery using the soft timeout option.
  4. Use the retry option carefully, always use exponential back-off so that your systems are not unnecessarily trying to call that third party api which is under maintenance for the next 30mins. Also always define the max number of retries.
  5. Use connection pool to connect to databases, instead of creating a new connection in the worker every time.

Finally the biggest learning of them all: DNS lookup is time consuming if not configured properly.

We saw constantly that average DNS lookup times on our production machines were around 100ms. We fixed it by installing and configuring Unbound (a local DNS cache) which reduced the subsequent lookup times to less than 1ms.

We also optimized “net.ipv4.ip_conntrack_max” setting in sysctl to make sure we don’t drop packets when there are thousands of tcp calls happening simultaneously.

helpchat
Screenshot of a production queue being consumed by the celery workers

What we did next to achieve even higher throughput (around 8000 tasks per second) is to use NodeJS as our consumer , as NodeJS has awesome async features. (Will write about it in my next post)

If you want to get into more detail, please feel free to ping me.

Cheers!!!

Step up your startup journey with BHASKAR! From resources to networking, BHASKAR connects Indian innovators with everything they need to succeed. Join today to access a platform built for innovation, growth, and community.

Note: The views and opinions expressed are solely those of the author and does not necessarily reflect the views held by Inc42, its creators or employees. Inc42 is not responsible for the accuracy of any of the information supplied by guest bloggers.

You have reached your limit of free stories
Become An Inc42 Plus Member

Become a Startup Insider in 2024 with Inc42 Plus. Join our exclusive community of 10,000+ founders, investors & operators and stay ahead in India’s startup & business economy.

2 YEAR PLAN
₹19999
₹7999
₹333/Month
UNLOCK 60% OFF
Cancel Anytime
1 YEAR PLAN
₹9999
₹4999
₹416/Month
UNLOCK 50% OFF
Cancel Anytime
Already A Member?
Discover Startups & Business Models

Unleash your potential by exploring unlimited articles, trackers, and playbooks. Identify the hottest startup deals, supercharge your innovation projects, and stay updated with expert curation.

How At Helpchat We Scaled Celery To Send 5.6 Bn Push Messages To Mobile Devices In India-Inc42 Media
How-To’s on Starting & Scaling Up

Empower yourself with comprehensive playbooks, expert analysis, and invaluable insights. Learn to validate ideas, acquire customers, secure funding, and navigate the journey to startup success.

How At Helpchat We Scaled Celery To Send 5.6 Bn Push Messages To Mobile Devices In India-Inc42 Media
Identify Trends & New Markets

Access 75+ in-depth reports on frontier industries. Gain exclusive market intelligence, understand market landscapes, and decode emerging trends to make informed decisions.

How At Helpchat We Scaled Celery To Send 5.6 Bn Push Messages To Mobile Devices In India-Inc42 Media
Track & Decode the Investment Landscape

Stay ahead with startup and funding trackers. Analyse investment strategies, profile successful investors, and keep track of upcoming funds, accelerators, and more.

How At Helpchat We Scaled Celery To Send 5.6 Bn Push Messages To Mobile Devices In India-Inc42 Media
How At Helpchat We Scaled Celery To Send 5.6 Bn Push Messages To Mobile Devices In India-Inc42 Media
You’re in Good company