In the realm of machine learning, attributes represent the various data elements that are used. These attributes often go by other names, such as fields, features, or variables. They serve as factors that impact the outcome in a predictive model, and in descriptive models, these data elements are analyzed for potential patterns or associations.
Model and Data Attributes
Model attributes are essentially bits of data leveraged within the model's mechanics. These are typically the columns implemented in the model's development, testing, or scoring phases. So, for example, an attribute of an algorithm might be a "SIZE" column featuring sizes like M, L, and X. However, a nested column displaying the sales numbers for a group of products, let's call it "SALE," is not considered a model attribute, even though SALE can be depicted as a data attribute. Factoring in every product and its associated sales figure, each row in the nested column is viewed as a model attribute.
Transformations can also introduce distinctions between model and data attributes. For instance, a transformation could execute a calculation on two data attributes, storing the resulting figure in a new attribute. This derived attribute is an exclusive model attribute, lacking a corresponding data attribute. Such modifications can be seen in processes like outlier rectification and normalization.
Target Attribute
This unique attribute harbors historical values within the target column of the training data. Historical values juxtaposed against predictions are stored in the target column of test data, generating a target prediction in the scoring process. Targets are not employed in models meant for clustering, feature extraction, association, or anomaly detection, and they can't handle unstructured data or nested columns.
Model Signature
The model signature is an aggregate of the data attributes used in model construction. When the model is scored, a proportion, or even all of the attributes in the signature should be present. The model tries to make up for any missing columns and also attempts data type conversion when the columns bear the same name but different data types. Any surplus, pointless columns are disregarded. Notably, the signature doesn't comprise the target or case ID fields.
Model Specifications
Through model specifications, users can understand the nuances of model attributes and how the algorithm manages them. This knowledge is advantageous in determining transformations made to attributes before the algorithm constructs the model.
The model's attributes could be numerical, categorical, or unstructured text. There's no limit, theoretically, to the possible numeric values an attribute might hold. For categorical attributes, the values define a limited array of distinct classes or categories that bear no inherent order.
In Conclusion
Attributes, or features, in machine learning can be considered as data fields that represent properties of a data object - for example, a customer's ID, address, etc. These combined attributes describing a specific object are also known as attribute vectors or machine learning feature vectors. Attributes can generally be classified as either quantitative (continuous, discrete, numerical) or qualitative (nominal, binary, ordinal).