Latest     About     Archive

Spark SQL - take your output column names directly from your case classes

Background

Avoid duplication and/or hardcoding of a list of columns that you will use for your output datasets. Once you have your domain correctly modelled, you can use this function to extract the attribute names from your case class.

Example Domain (case classes)


  case class Food(
                  fat: Int,
                  protein: Int,
                  carbohydrates: Int,
                  vitamins: List[String],
                  minerals: Liist[String]
                  )
      

Function


  def extractCaseClassAttribuesAsColumnsList[T](c: Class[T]): Seq[Column] = {
    c
      .getDeclaredFields
      .map(f => col(f.getName))
      .toSeq
      

Calling The Function


  val outputColumns: Seq[Columns] = extractCaseClassAttribuesAsColumnsList[Food]
   
  dataset
    .select(outputColumns: _*)