Friday, March 26, 2021

Python Study notes: How to run Scala and Spark in the Jupyter notebook

Here we provide some tutorial of running scala in jupyter notebook. It can also be used for scala development with the spylon-kernel. This is an additional kernel that has to be installed separately.
## Prerequisites
* Apache Spark 2.1.1 compiled for Scala 2.11
* Jupyter Notebook
* Python 3.5+

Step1: install the package using `pip` or `conda`

```bash
pip install spylon-kernel
# or
conda install -c conda-forge spylon-kernel
```

Step2: create a kernel spec
This will allow us to select the scala kernel in the notebook.
python -m spylon_kernel install

Step3: start the jupyter notebook
ipython notebook

Step4:  in the notebook we select 
New -> spylon-kernel 
#This will start our scala kernel.

Step5: testing the notebook
val x = 2
val y = 3
x+y

Test: use python:
%%python
x=2
print(x)

Test: we can even use spark to create a dataset:
val data = Seq((1,2,3), (4,5,6), (6,7,8), (9,19,10))
val ds = spark.createDataset(data)
ds.show()

No comments:

Post a Comment

Python Study notes: how do we use Underscore(_) in Python

You will find max six different uses of underscore(_) . If you want you can use it for different purposes after you have an idea about unde...