Creating Kubernetes Jobs.

Sometimes you need to run a container to execute a specific task and then stop it.

Normally in Kubernetes, if you try just to run it, it will actually create a deployment,
meaning you container will keep running all the time. That is because by default kubectl runs with ‘–restart=”Always”‘ policy.
So if you don’t want to create a yaml file where you specify pod ‘kind’ as a Job, but simply use kubectl run, you can set restart policy to ‘OnFailure’.
Let’s run simple container as a job. It is a simple web crawler which I have written for one of my job interviews, it has many bugs and incomplete,
but sometimes it actually works 🙂 So let’s run it:

➜  ~ kubectl run crawler --restart=OnFailure --image=kayan/web-crawler \
 -- http://www.gamesyscorporate.com http://www.gamesyscorporate.com 3
job "crawler" created

Now we can check the state of the pod:

➜  ~ kubectl get pod
NAME            READY     STATUS              RESTARTS   AGE
crawler-k57bh   0/1       ContainerCreating   0          2s

it will take a while, as it needs to download the image first, to check run:

kubectl describe pod crawler

You should see something like below:

➜  ~ kubectl describe pod crawler
Name:		crawler-k57bh
Namespace:	default
Node:		minikube/192.168.99.100
Start Time:	Sun, 26 Nov 2017 22:20:28 +0000
Labels:		controller-uid=01e62349-d2f8-11e7-accd-08002739d83b
		job-name=crawler
		run=crawler
Annotations:	kubernetes.io/created-by={"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"Job","namespace":"default","name":"crawler","uid":"01e62349-d2f8-11e7-accd-08002739d83b","apiVersion":"bat...
Status:		Running
IP:		172.17.0.5
Created By:	Job/crawler
Controlled By:	Job/crawler
Containers:
  crawler:
    Container ID:	docker://059c0a13f7963401a6e20eb7d6eb540a77e75384ba4d7311b435f88b8a00b669
    Image:		kayan/web-crawler
    Image ID:		docker-pullable://kayan/web-crawler@sha256:e2eb8c3b16d4749ab7f2c0708424e407aa82d487ac800ee05ecf87a4c9c54243
    Port:		<none>
    Args:
      http://www.gamesyscorporate.com
      http://www.gamesyscorporate.com
      3
    State:		Running
      Started:		Sun, 26 Nov 2017 22:20:31 +0000
    Ready:		True
    Restart Count:	0
    Environment:	<none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-r89kc (ro)
Conditions:
  Type		Status
  Initialized 	True
  Ready 	True
  PodScheduled 	True
Volumes:
  default-token-r89kc:
    Type:	Secret (a volume populated by a Secret)
    SecretName:	default-token-r89kc
    Optional:	false
QoS Class:	BestEffort
Node-Selectors:	<none>
Tolerations:	<none>
Events:
  FirstSeen	LastSeen	Count	From			SubObjectPath			Type		Reason			Message
  ---------	--------	-----	----			-------------			--------	------			-------
  40s		40s		1	default-scheduler					Normal		Scheduled		Successfully assigned crawler-k57bh to minikube
  40s		40s		1	kubelet, minikube					Normal		SuccessfulMountVolume	MountVolume.SetUp succeeded for volume "default-token-r89kc"
  40s		40s		1	kubelet, minikube	spec.containers{crawler}	Normal		Pulling			pulling image "kayan/web-crawler"

Last line is about a last event, “pulling image “kayan/web-crawler” – once this is done, it should create and then run it, so if you describe it again later you should see more events:

  38s		38s		1	kubelet, minikube	spec.containers{crawler}	Normal		Pulled			Successfully pulled image "kayan/web-crawler"
  38s		38s		1	kubelet, minikube	spec.containers{crawler}	Normal		Created			Created container
  37s		37s		1	kubelet, minikube	spec.containers{crawler}	Normal		Started			Started container

Now I would like to check the jobs’s state:

➜  ~ kubectl get job
NAME      DESIRED   SUCCESSFUL   AGE
crawler   1         0            8s

As you can see it is still running the job, and once finished, SUCCESSFUL will be changed to 1:

➜  ~ kubectl get job
NAME      DESIRED   SUCCESSFUL   AGE
crawler   1         1            1m

Once job is done, pod will disappear:

➜  ~ kubectl get pod
No resources found, use --show-all to see completed objects.

But hold on, how we will check the logs of completed pod if it is not here?

Thankfully there is a way to show terminated pods with ‘-a’ argument, which stands for ‘all’:

➜  ~ kubectl get pod -a
NAME            READY     STATUS      RESTARTS   AGE
crawler-qmzpp   0/1       Completed   0          10h

Now, knowing the name of the pod we can show it’s logs:

➜  ~ kubectl logs  crawler-97d6q | grep "Java Developer"

<a href="http://www.gamesyscorporate.com/careers/jobs/?gh_jid=923720" target="_blank"><span class="title">Java Developer </span><span class="location">Estonia </span></a>
<a href="http://www.gamesyscorporate.com/careers/jobs/?gh_jid=493517" target="_blank"><span class="title">Java Developer </span><span class="location">Estonia </span></a>
<a href="http://www.gamesyscorporate.com/careers/jobs/?gh_jid=493500" target="_blank"><span class="title">Java Developer - Cloud Platform</span><span class="location">Estonia </span></a>
<a href="http://www.gamesyscorporate.com/careers/jobs/?gh_jid=721812" target="_blank"><span class="title">Java Developer - Games Development</span><span class="location">London, Piccadilly</span></a>
<a href="http://www.gamesyscorporate.com/careers/jobs/?gh_jid=927697" target="_blank"><span class="title">Java Developer - Games Development (Bingo)</span><span class="location">London, Piccadilly</span></a>
<a href="http://www.gamesyscorporate.com/careers/jobs/?gh_jid=874603" target="_blank"><span class="title">Senior Java Developer - Payments</span><span class="location">London, Piccadilly</span></a>
<a href="http://www.gamesyscorporate.com/careers/jobs/?gh_jid=923720" target="_blank"><span class="title">Java Developer </span><span class="location">Estonia </span></a>
<a href="http://www.gamesyscorporate.com/careers/jobs/?gh_jid=493517" target="_blank"><span class="title">Java Developer </span><span class="location">Estonia </span></a>
<a href="http://www.gamesyscorporate.com/careers/jobs/?gh_jid=493500" target="_blank"><span class="title">Java Developer - Cloud Platform</span><span class="location">Estonia </span></a>
<a href="http://www.gamesyscorporate.com/careers/jobs/?gh_jid=721812" target="_blank"><span class="title">Java Developer - Games Development</span><span class="location">London, Piccadilly</span></a>
<a href="http://www.gamesyscorporate.com/careers/jobs/?gh_jid=927697" target="_blank"><span class="title">Java Developer - Games Development (Bingo)</span><span class="location">London, Piccadilly</span></a>
<a href="http://www.gamesyscorporate.com/careers/jobs/?gh_jid=874603" target="_blank"><span class="title">Senior Java Developer - Payments</span><span class="location">London, Piccadilly</span></a>
<a href="http://www.gamesyscorporate.com/careers/jobs/?gh_jid=923720" target="_blank"><span class="title">Java Developer </span><span class="location">Estonia </span></a>
<a href="http://www.gamesyscorporate.com/careers/jobs/?gh_jid=493517" target="_blank"><span class="title">Java Developer </span><span class="location">Estonia </span></a>
<a href="http://www.gamesyscorporate.com/careers/jobs/?gh_jid=493500" target="_blank"><span class="title">Java Developer - Cloud Platform</span><span class="location">Estonia </span></a>
<a href="http://www.gamesyscorporate.com/careers/jobs/?gh_jid=721812" target="_blank"><span class="title">Java Developer - Games Development</span><span class="location">London, Piccadilly</span></a>
<a href="http://www.gamesyscorporate.com/careers/jobs/?gh_jid=927697" target="_blank"><span class="title">Java Developer - Games Development (Bingo)</span><span class="location">London, Piccadilly</span></a>
<a href="http://www.gamesyscorporate.com/careers/jobs/?gh_jid=874603" target="_blank"><span class="title">Senior Java Developer - Payments</span><span class="location">London, Piccadilly</span></a>

As you can see, I retrieved the logs from web crawler, and then ran grep command to get latest “Java Developer” vacancies available 🙂

If you run the job again, with the same name, it will fail:

➜  ~ kubectl run crawler --restart=OnFailure --image=kayan/web-crawler -- http://www.gamesyscorporate.com http://www.gamesyscorporate.com 3
Error from server (AlreadyExists): jobs.batch "crawler" already exists
➜  ~

To do it again, you need to delete the job first, it might look strange, as even though pod doesn’t exists, the job will still be there:

➜  ~ kubectl delete job crawler
job "crawler" deleted

Now let’s run it again, and this time, retrieve the logs as it happens, real-time:

➜  ~ kubectl logs -f crawler-97d6q
crawling to: http://www.gamesyscorporate.com with domain: http://www.gamesyscorporate.com with number of threads: 3
jobs: 42
processed link: 4
jobs: 48
processed link: 6
jobs: 47
...
....
p