Friday, March 26, 2021

Python Study notes: How to run Scala and Spark in the Jupyter notebook

Here we provide some tutorial of running scala in jupyter notebook. It can also be used for scala development with the spylon-kernel. This is an additional kernel that has to be installed separately.
## Prerequisites
* Apache Spark 2.1.1 compiled for Scala 2.11
* Jupyter Notebook
* Python 3.5+

Step1: install the package using `pip` or `conda`

pip install spylon-kernel
# or
conda install -c conda-forge spylon-kernel

Step2: create a kernel spec
This will allow us to select the scala kernel in the notebook.
python -m spylon_kernel install

Step3: start the jupyter notebook
ipython notebook

Step4:  in the notebook we select 
New -> spylon-kernel 
#This will start our scala kernel.

Step5: testing the notebook
val x = 2
val y = 3

Test: use python:

Test: we can even use spark to create a dataset:
val data = Seq((1,2,3), (4,5,6), (6,7,8), (9,19,10))
val ds = spark.createDataset(data)

No comments:

Post a Comment

AWS Study notes: step function vs lambda function

Step functions == coordinator(project manager) lambda function == the main worker for each task(project developer). Step functions are ...