Install Apache Kafka in Ubuntu 14.04

This post just shows you the way to install Kafka easily. To run Kafka, we have to have Zookeeper. So we will install Zookeeper firstly. After that, We install Kafka. To get the links to download the latest version of Kafka and Zookeeper, you can find them in Apache Foundation website.

Install Java (if needed)

1. Install Zookeeper

You can get the binary download link here: http://zookeeper.apache.org/releases.html#download

2. Install Kafka

You can get the binary download link here: http://kafka.apache.org/downloads.html

And if you want to run Kafka run on startup and automatically restart if Kafka crashes, you should use Upstart in Ubuntu 14.04. The below is an example for kafka.conf:

 

A simple examle for Python Kafka Avro

In the weekend, I try to use Python to write a producer and a consumer for Apache Kafka. I found Kafka-Python library that can help me do it easily. However, If you try to send Avro data from Producer to Consumer, it is not easy. You have to understand about them. We have enough specifications but there is no example source code. So this is a simple example to create a producer (producer.py) and a consumer (consumer.py) to stream Avro data via Kafka in Python.

The wise man never knows all, only fools know everything.

To run this source code, please make sure that you installed Kafka (https://sonnguyen.ws/install-apache-kafka-in-ubuntu-14-04/) and Python libraries (kafka-python, avro, io). And I am using Python 2.7

Create producer.py

Create consumer.py

Time for test:

I hope that this post will help you say “Hello” to Kafka, Python and Avro

Please see the details in GitHub: https://github.com/thanhson1085/python-kafka-avro

In the source code repository above, I also created consumer_bottledwater-pg.py to decode avro data that pushed from bottedwater-pg Kafka producer. This base on the question in Stackoverflow 

Logrotate Nginx Custom Logs

As you know, when you create a site in nginx webserver, you will want to add custom logs for that site. So this post is a small tip to help you how to do it perfectly.

In your nginx site config (e.g /etx/nginx/site-availables/example.com), you add custom logs as below:

So now, your log files will be bigger day by day. You will have to remember to delete the log files if you do not want to your hard disk be full. What should you do now? Fortunately, Logrotate will help you compress. clean and backup it daily.

You just have to add (append) the content below to /etc/logrotate.d/nginx:

Finaly, run the command below to force the update.

It is done.

How to save a file to HDFS with Python

This source code is a simple example the way how to upload image and save it to HDFS. This program will connect HDFS via webhdfs. Actually, it is easier than you think. The most dificulty is preparing environment to test your source code

Help people, even when you know they can't help you back

Help people, even when you know they can’t help you back

Prerequisites:

  • python3
  • virtualenv (optional)
  • pip3
  • Flask==0.10.1
  • Flask-Swagger
  • Hadoop with webhdfs

Installation:

To install Hadoop, pleas take a look at https://github.com/thanhson1085/docker-cloudera-quickstart

I will show you the detail, just hope that it will save your time.

At the first, using Docker to clone docker-cloudera-quickstart to your local.

And run that image, and please do not forget expose webhdfs port(50070, 50075):

There is 2 ways to run my source code. I recommend you using Docker

Using Docker

Install in normal way

There is a trick here. You have to edit /etc/hosts file to your machine know where is HDFS server. You can use “docker inspect” command to get HDFS server ip address.

Finally, you have a HDFS Server with webhdfs support.

Clone this source code to your local

Run the commands below to install and create environment to run the application

Test

I integrated swagger in this app. So you can use Swagger UI to test this app. And you can find swagger schema at http://localhost:5000/docs

After upload the files. You can use hadoop command to look up in hdfs.

Go to Hadoop server:

List files in hdfs:

The output should be:

Github: https://github.com/thanhson1085/flask-webhdfs

Flask RabbitMQ Celery example

If you are learning the way how to work with RABBITMQ + CELERY. This source code may help you. I wrote a small example to upload image to web server and use Celery to generate a thumbnail.

Difficult doesn’t mean impossible. It simply means that you have to work hard

This source code supports 2 ways to run it.

You can check the source code in Github: flask-celery-rabbitmq-generate-thumbnail

And the image at docker: flask-celery-rabbitmq-example

At the first, clone this source code to your local:

Using Docker

  1. Build from Dockerfile

Or pull from Docker Repo

Run Docker image

After running the docker image, you should wait for the output as below:

Install packets normally (Ubuntu 14.04)

I will show you how to run this source code from scratch. And i am using ubuntu server 14.04, installed virtualenv, pip.

Install RabbitMQ Server:

Fix the issue of PIL

Create environment to run the application with virtualenv:

Install all packets required:

Run web server to upload files

Run the “generate thumbnail” task in Celery

Now, it is ready for testing. (http://localhost:5000). The page should be as image below:

generate thumbnail with celery rabbitmq

Generate thumbnail with celery rabbitmq